At least one embodiment of the invention generally relates to a method of determining markers for a disease from a patient, wherein information from epigenomics and/or the transcriptome from peripheral blood and a diseased tissue or information from epigenomics and the transcriptome from peripheral blood or a diseased tissue is used for obtaining the markers, as well as a method of determining a risk for a disease in a patient using the markers obtained thereby.
The finding of markers for diagnosing diseases is a recently growing field due to new high-throughput methods of analysis of samples of patients as well as the availability of sufficient computing power to analyze the vast amount of data generated thereby.
This enables the identification of a variety of markers for a multitude of diseases, e.g. cardiac diseases, cancer, etc.
Heart failure (HF) is one major cause of morbidity and mortality in the general population and is the leading cause of hospitalization in individuals older than 65. Currently, 2% of general population suffers from HF, in elderly this increases to about 10%. In all western countries there is additionally an increasing prevalence of clinical manifest HF predicted.
HF is the result of an underlying cardiac disease. The two most common reasons for developing HF are systolic and/or diastolic dysfunction. For systolic HF, also referred to as HF-rEF the main reasons are ischemic heart disease due to coronary artery disease and myocardial infarction and non-ischemic causes such as Dilated Cardiomyopathy (DCM). DCM is a frequent heart muscle disease with an estimated prevalence of 1:2500 up to 1:500, which is caused by genetic mechanism, inflammation or infection. The progressive nature of the disorder is responsible for nearly 50,000 hospitalizations and 10,000 deaths per year in the US alone and is the main cause for heart transplantation in young adults. Overall, the incidence of the disease has continually increased over the past years and it was recognized that DCM has a substantial genetic contribution. It is estimated that about 30-40% of all DCM cases show familial aggregation and until now more than 40 different genes were found to cause genetic DCM.
Diagnosis and risk stratification of HF and DCM is still challenging and relies predominantly on symptoms, cardiovascular imaging parameters and biomarkers such as N-terminal pro b-type natriuretic peptide (Nt-ProBNP). Although highly accurate, Nt-ProBNP has its own caveats. For instance, several confounding factors can alter plasma level of Nt-ProBNP such as, age, gender, race, obesity, exercise, renal failure and anemia.
For better understanding of diseases like HF and to define therapy and diagnostic strategies, more accurate molecular biomarkers are needed. While several studies have now identified common genetic polymorphisms that are associated with DCM or heart failure—disclosed in Friedrichs, F. et al.: HBEGF, SRA1, and IK: Three cosegregating genes as determinants of cardiomyopathy, 395-403 (2009), doi:10.1101/gr.076653.108.19; and Villard, E. et al.: A genome-wide association study identifies two loci associated with heart failure due to dilated cardiomyopathy, Eur. Heart J. 32, 1065-76 (2011); epigenetic alterations—disclosed in Haas, J. et al.: Alterations in cardiac DNA methylation in human dilated cardiomyopathy, EMBO Mol. Med. 5, 413-429 (2013); or miRNA expression patterns, there still is an unmet need for reliable markers of HF/DCM, as well as other diseases.
Heart failure is the leading cause of hospitalization and death in Western countries. Over the last decades the genetic causes and molecular events driving the progression of heart failure have only been partially unravelled. Besides genetic predisposition (Meder B, et al., A genome-wide association study identifies 6p21 as novel risk locus for dilated cardiomyopathy. Eur Heart J. 2014; 35:1069-77; Villard E, et al., A genome-wide association study identifies two loci associated with heart failure due to dilated cardiomyopathy. Eur Heart J. 2011; 32:1065-76), it is long known that additional aspects including environmental factors and life-style influence the outbreak and course of myocardial failure (Hang C T, et al., Chromatin regulation by Brg1 underlies heart muscle development and disease. Nature. 2010; 466:62-7). The precise mode of action how such extrinsic, environmental factors may influence the phenotype of an individual and his disease is basically unknown.
Most recently, cardiovascular research has made first steps towards elucidating the role of the cardiac epigenome. During cardiac development, a series of dynamic changes in the methylation of gene bodies and Histone marks of developmental and sarcomeric genes were detected, a pattern that is partially re-established in failing cardiomyocytes (Hang C T, et al., Chromatin regulation by Brg1 underlies heart muscle development and disease. Nature. 2010; 466:62-7; Sergeeva I A, et al., Identification of a regulatory domain controlling the Nppa-Nppb gene cluster during heart development and stress. Development. 2016; 143:2135-46; Greco C M, et al., DNA hydroxymethylation controls cardiomyocyte gene expression in development and hypertrophy. Nature communications. 2016; 7:12418). In the adaption to stress and during hypertrophy, similar observations were made in engineered heart tissue from rats, pointing towards conservation of methylation-based gene patterning across species (Stenzig J, et al., DNA methylation in an engineered heart tissue model of cardiac hypertrophy: common signatures and effects of DNA methylation inhibitors. Basic Res Cardiol. 2016; 111:9). While these studies indicate a potentially central role of epigenetic regulation in the heart and highly sophisticated technologies exist to assess Histone-modifications or DNA methylation at a base-pair resolution, the lack of availability of myocardial specimen from patients is a major roadblock for elucidating the impact of such changes on complex cardiovascular traits (Greco C M and Condorelli G. Epigenetic modifications and noncoding RNAs in cardiac hypertrophy and failure. Nat Rev Cardiol. 2015; 12:488-97). Hence, mainly animal studies or investigations of very small clinical cohorts could shed some light onto the presence and role of chemical alterations of cardiac DNA in heart failure or cardiomyopathy.
One of the pioneering studies on DNA methylation in heart failure was published by the group of Roger Foo in 2011 (Movassagh M, et al., Distinct epigenomic features in endstage failing human hearts. Circulation. 2011; 124:2411-22). They identified that epigenetic changes in heart failure occur not uniformly across the genome, but are concentrated in promoter CpG islands, intragenic CpG islands and gene bodies. The limitation of this study was the very small sample size of only 4 end-stage heart failure cardiac explants that were investigated. In 2013 Haas et al. were able to identify and replicate genome-wide signatures of lower resolution DNA methylation changes in living patients suffering from Dilated Cardiomyopathy (DCM), which is a major cause of non-ischemic heart failure (Haas J, et al., Alterations in cardiac DNA methylation in human dilated cardiomyopathy. EMBO Mol Med. 2013; 5:413-29). In this study, they identified a set of novel candidate genes that are potentially involved in heart failure, such as ADORA2A and LY75. Another of the few available examples identified Methyl-CpG-binding protein 2 (MeCP2), a downstream effector of DNA methylation, to be repressed during heart failure in humans and reactivated after mechanical unloading of the left ventricle by assist devices (Mayer S C, et al., Adrenergic Repression of the Epigenetic Reader MeCP2 Facilitates Cardiac Adaptation in Chronic Heart Failure. Circ. Res. 2015; 117:622-33), pointing towards a potential role of targeted epigenetic therapies for heart failure.
Biochemical DNA modification resembles a crucial regulatory layer between genetic information, environmental factors and the transcriptome.
To identify epigenetic susceptibility regions and novel biomarkers linked to myocardial dysfunction and heart failure, the inventors performed the first multi-omics study in myocardial tissue and blood of patients with Dilated Cardiomyopathy (DCM) and controls.
The present inventors dissected for the first time high-resolution epigenome-wide cardiac and blood DNA methylation in conjunction with mRNA and whole-genome sequencing in a large cohort of densely-phenotyped patients with systolic heart failure due to DCM. They provide the yet largest dataset of cardiac and blood DNA methylation profiles and identified key epigenomic patterns that are distinct fingerprints of human heart failure.
The present inventors have found that improved marker finding is possible when more than one characteristic of the sample, e.g. the nucleic acid sequence, is considered. Further, it was found that also improved marker finding is possible when more than one sample from different sources is considered, wherein one if preferably from tissue related to a disease and a further one from peripheral blood.
In a first aspect, the present invention is related to a method of determining markers for a disease from a patient, comprising
obtaining or providing at least one sample of peripheral blood and at least one sample of a diseased tissue of the patient diagnosed with the disease;
obtain an epigenomics profile and/or analyze a transcriptome of the at least one sample of the peripheral blood and the at least one sample of the diseased tissue;
compare the epigenomics profile and/or the transcriptome to an epigenomics profile and/or a transcriptome of a suitable control, respectively; and
determine one or more alteration in the epigenomics profile and/or the transcriptome in both the at least one sample of the peripheral blood and at least one sample of the diseased tissue of the patient diagnosed with the disease.
Further, the present invention relates to a method of determining markers for a disease from a patient, comprising
obtaining or providing at least one sample of peripheral blood or at least one sample of a diseased tissue of the patient diagnosed with the disease;
obtain an epigenomics profile and analyze a transcriptome of the at least one sample of the peripheral blood or the at least one sample of the diseased tissue;
compare the epigenomics profile and the transcriptome to an epigenomics profile and a transcriptome of a suitable control, respectively; and
determine one or more alteration in the epigenomics profile and the transcriptome in either the at least one sample of the peripheral blood or the at least one sample of the diseased tissue of the patient diagnosed with the disease.
Additionally, a method of determining a risk for a disease in a patient, comprising
obtaining or providing an epigenomics profile and/or a transcriptome of at least one sample of the peripheral blood and/or the a diseased tissue, e.g. the myocard/myocardium, of the patient, and
determining the presence of at least one marker as determined by the method of the first or second aspect is disclosed.
Further disclosed is a data bank comprising specific markers for heart failure and/or dilated cardiomyopathy in a patient, the use of this databank in a method of determining a risk for heart failure and/or dilated cardiomyopathy in a patient, and the use of the specific markers as a marker for heart failure and/or dilated cardiomyopathy in a patient.
In addition, a method of determining a risk for a disease in a patient, comprising
obtaining or providing data of an epigenomics profile and/or a transcriptome of at least one sample of the peripheral blood and/or a diseased tissue of the patient, and
determining the presence of at least one marker as determined by the method of the first or second aspect is disclosed, as well as a computer program product comprising computer executable instructions which, when executed, perform such a method.
Further aspects and embodiments of the invention are disclosed in the dependent claims and can be taken from the following description, figures and examples, without being limited thereto.
The enclosed drawings should illustrate embodiments of the present invention and convey a further understanding thereof. In connection with the description they serve as explanation of concepts and principles of the invention. Other embodiments and many of the stated advantages can be derived in relation to the drawings. The elements of the drawings are not necessarily to scale towards each other. Identical, functionally equivalent and acting equal features and components are denoted in the figures of the drawings with the same reference numbers, unless noted otherwise.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The term “nucleic acid molecule” refers to a polynucleotide molecule having a defined sequence. It comprises DNA molecules, RNA molecules, nucleotide analog molecules and combinations and derivatives thereof, such as DNA molecules or RNA molecules with incorporated nucleotide analogs or cDNA.
The term “nucleic acid sequence information” relates to information which can be derived from the sequence of a nucleic acid molecule, such as the sequence itself or a variation in the sequence as compared to a reference sequence.
The term “mutation” relates to a variation in the sequence as compared to a reference sequence. A mutation is for example a deletion of one or multiple nucleotides, an insertion of one or multiple nucleotides, or substitution of one or multiple nucleotides, duplication of one or a sequence of multiple nucleotides, translocation of one or a sequence of multiple nucleotides, and, in particular, a single nucleotide polymorphism (SNP).
In the context of the present invention a “sample” is a sample which comprises at least epigenetic information and/or information regarding the transcriptome of a patient. Examples for samples are: cells, tissue, biopsy specimens, body fluids, blood, urine, saliva, sputum, plasma, serum, cell culture supernatant, swab sample and others.
An epigenomics profile corresponds to the multitude of all epigenomic modifications, i.e. DNA methylation, Histone methylation, etc., that can occur in a patient.
A transcriptomics profile corresponds to the multitude of all transcribed nucleic acids, i.e. messenger RNA, micro RNAs, non-coding RNAs, etc.
Peripheral blood refers to the circulating pool of blood within the patient.
According to certain embodiments, the patient in the present methods is a vertebrate, more preferably a mammal and most preferred a human patient.
A vertebrate within the present invention refers to animals having a vertebrae, which includes mammals—including humans, birds, reptiles, amphibians and fishes. The present invention thus is not only suitable for human medicine, but also for veterinary medicine.
New and highly efficient methods of sequencing nucleic acids referred to as next generation sequencing have opened the possibility of large scale genomic analysis. The term “next generation sequencing” or “high throughput sequencing” refers to high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. Examples include Massively Parallel Signature Sequencing (MPSS), Polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, Ion semiconductor sequencing, DNA nanoball sequencing, Helioscope™ single molecule sequencing, Single Molecule SMRT™ sequencing, Single Molecule real time (RNAP) sequencing, Nanopore DNA sequencing, Sequencing By Hybridization, Amplicon Sequencing, GnuBio.
Before the invention is described in exemplary detail, it is to be understood that this invention is not limited to the particular component parts of the process steps of the methods described herein as such methods may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting. It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include singular and/or plural referents unless the context clearly dictates otherwise. For example, the term “a” as used herein can be understood as one single entity or in the meaning of “one or more” entities. It is also to be understood that plural forms include singular and/or plural referents unless the context clearly dictates otherwise. It is moreover to be understood that, in case parameter ranges are given which are delimited by numeric values, the ranges are deemed to include these limitation values.
In a first aspect, the present invention relates to a method of determining markers for a disease from a patient, comprising
obtaining or providing at least one sample of peripheral blood and at least one sample of a diseased tissue of the patient diagnosed with the disease;
obtain an epigenomics profile and/or analyze a transcriptome of the at least one sample of the peripheral blood and the at least one sample of the diseased tissue;
compare the epigenomics profile and/or the transcriptome to an epigenomics profile and/or a transcriptome of a suitable control, respectively; and
determine one or more alteration in the epigenomics profile and/or the transcriptome in both the at least one sample of the peripheral blood and at least one sample of the diseased tissue of the patient diagnosed with the disease.
In this first aspect, thus at least two different samples are obtained, and these can be analyzed with regard to the epigenomics profile, the transcriptome, or both. This is schematically shown in exemplary
According to
In an alternative method shown in
In a second aspect, the present invention relates to a method of determining markers for a disease from a patient, comprising
obtaining or providing at least one sample of peripheral blood or at least one sample of a diseased tissue of the patient diagnosed with the disease;
obtain an epigenomics profile and analyze a transcriptome of the at least one sample of the peripheral blood or the at least one sample of the diseased tissue;
compare the epigenomics profile and the transcriptome to an epigenomics profile and a transcriptome of a suitable control, respectively; and
determine one or more alteration in the epigenomics profile and the transcriptome in either at least one sample of the peripheral blood or the at least one sample of the diseased tissue of the patient diagnosed with the disease.
In this second aspect, thus at least one sample is obtained, but not from different sources. This sample is then analyzed with regard to both the epigenomics profile and the transcriptome. This is schematically shown in exemplary
According to
The disease in the present invention is not particularly limited. According to certain embodiments, it is a non-infectious disease, particularly a cardiovascular disease. According to certain embodiments, the disease is heart failure (HF) and/or dilated cardiomyopathy (DCM). In such a case, the sample of the diseased tissue can be obtained from myocardial tissue.
The obtaining of the sample is also not particularly limited, but is preferably non-invasive, e.g. is taken from a stock or from a storage, etc.
Further, also the obtaining of the epigenomics profile as well as the analysis of the transcriptome are not particularly limited and can be suitably carried out using known means, including sequencing, bead array or microarray technology.
Also, the comparison to an epigenomics profile and/or a transcriptome of a suitable control is not particularly limited and can be done in any way, e.g. using computational programs, etc. Further, the alteration in the epigenomics profile and/or the transcriptome is not particularly limited. According to certain embodiments, the alteration is a hyper and/or hypo methylation and/or a change in chromatin marks and/or a change in the RNA (e.g. messenger RNA, micro RNA, non-coding RNA etc.) expression level, e.g. an increase or decrease in RNA expression level, wherein all combinations are possible, e.g. a hyper methylation in combination with a decrease or an increase in RNA expression level, or a hypo methylation in combination with a decrease or an increase in RNA expression level.
The control is not limited as well and can be suitably chosen based on the patient. For example, a control can be obtained from one or more patients not diagnosed with the disease, or from a publicly known control that is not affected by the disease. According to certain embodiments, the one or more alteration is determined with regard to the nucleic acid sequence information of the patient, e.g. the genome. According to certain embodiments, the patient is a human. According to certain embodiments, the patient is a human and the control is reference genome hg19, as provided by e.g. Genome Reference Consortium and the University of California, Santa Cruz (GRCh37/hg19, downloadable from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ and http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/). Gene regions are based on the GRCh37/hg19 and the Gencode 19 gene model (http://www.gencodegenes.org/).
According to certain embodiments a plurality of samples of the peripheral blood and/or the diseased tissue are obtained or provided from patients diagnosed with the disease. This way statistical significance of the found markers can be improved.
In a further aspect, the present invention relates to a method of determining a risk for a disease in a patient, comprising
obtaining or providing an epigenomics profile and/or a transcriptome of at least one sample of the peripheral blood and/or a diseased tissue, e.g. the myocard, of the patient, and
determining the presence of at least one marker as determined by the method of the first or the second aspect.
Again, the obtaining of the sample is not particularly limited, but is preferably non-invasive, e.g. is taken from a stock or from a storage, etc.
According to certain embodiments, the diseased tissue is the myocard, and preferably the disease is heart failure and/or dilated cardiomyopathy.
For heart failure and/or dilated cardiomyopathy, a list of markers for improved determination of a risk for these diseases has been found by the present methods of the first and second aspect. These are shown in the following tables.
Thus, according to certain embodiments, the at least one epigenetic and/or transcriptomic marker for determining a risk for heart failure and/or dilated cardiomyopathy
is contained in genomic regions with regard to reference genome hg19 that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue and are associated with RNA expression levels and is chosen from the sequences disclosed in Table 1, preferably Table 1a, particularly preferably Table 1b; and/or
is contained in genomic regions with regard to reference genome hg19 that show hyper/hypo methylation in HF/DCM in myocardial tissue and are associated with RNA expression levels and is chosen from the sequences disclosed in Table 2, preferably Table 2a, particularly preferably Table 2b; and/or
is contained in genomic regions with regard to reference genome hg19 that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue and is chosen from the sequences disclosed in Table 3, preferably Table 3a, particularly preferably Table 3b; and/or
is contained in genomic regions with regard to reference genome hg19 that show dysmethylation in HF/DCM in peripheral blood and is chosen from the sequences disclosed in Table 4; and/or
is contained in genomic regions with regard to the reference Infinium HumanMethylation450K database and the reference genome hg19, respectively, that show dysmethylation in HF/DCM in peripheral blood and is chosen from the cpg IDs or positions disclosed in Table 5; and/or
is contained in genomic regions with regard to reference genome hg19 that show dysmethylation in HF/DCM in peripheral blood and is chosen from the sequences disclosed in Table 6; and/or
is contained in genomic regions with regard to the reference Infinium HumanMethylation450K database and the reference genome hg19, respectively, that show dysmethylation in HF/DCM in peripheral blood and is chosen from the cpg IDs or positions disclosed in Table 7; and/or
is contained in genomic regions with regard to reference genome hg19 that show dysmethylation in HF/DCM in peripheral blood and is chosen from the sequences disclosed in Table 8; and/or
is contained in genomic regions with regard to the reference Infinium HumanMethylation450K database and the reference genome hg19, respectively, that show dysmethylation in HF/DCM in peripheral blood and is chosen from the cpg IDs or positions disclosed in Table 9; and/or
is contained in genomic regions with regard to reference genome hg19 that show dysmethylation in HF/DCM in peripheral blood and myocardial tissue and are associated with RNA expression levels and is chosen from the ANF and/or BNP loci and/or the sequences disclosed in Table 10. In the tables 1, 1a, 1b, 2, 2a, 2b, 3, 3a, 3b, 4, 6, 8, and 10 the sequences are the nucleic acid sequences between the positions in the columns titled start and end in the respective chromosomes (chr.), including the positions given under start and end, with regard to reference genome hg19. Further, in Tables 1, 1a, 1b, 2, 2a, 2b, 3, 3a, and 3b sequences are given in columns 1 and 2 as well as in columns 4 and 5 for brevity sake, i.e. one sequence is between and including the positions in columns 1 and 2, and one sequence is between and including the positions in columns 4 and 5. Tables 5, 7 and 9 represent distinct cpg IDs with regard to the reference Infinium HumanMethylation450K database and positions with regard to reference genome hg19 that show statistically significant dysmethylation in peripheral blood.
The inventors have found that a hyper/hypo methylation can affect both strands and therefore genes on both strands. They further found out that it also does not only affect the gene regions itself, but also the surrounding area, particularly within a region of 10000 base pairs, more particularly within a region of 1000 base pairs. Not only coding regions may be influenced thereby, but also regions surrounding the coding regions, e.g. promoter regions, etc. Thus, while the most significant results may be found in only a very limited region, hyper/hypo methylation was observed within a broad region around the position, without a significant change in the significance within 10000 base pairs, as is also shown in e.g.
ID numbers for the methylation (methyl. ID) refer to the Infinium HumanMethylation450 BeadChip Kit probe IDs as listed in the HumanMethylation450 v1.2 Manifest (http://support.illumina.com/downloads/infinium_humanmethylation450_product_files.html), preferred reading directions for the respective double helix strand (str.; + or −) with regard to the reference genome for the genes as well as gene names, gene ensemble IDs (gene ID) and chromosomes (chr.) are found in Tables 1c, 2c and 2d, and 3c for Tables 1, 1a, 1b; 2, 2a, 2b; and 3, 3a, and 3b, respectively. Also, the starts and ends are given, with the respective tables in brackets. It should be noted that table 2, respectively 2a and 2b, has been split in two tables 2c and 2d, since for Table 2d the whole region has been shown to be significantly deregulated on methylation and expression level. Further, gene IDs, gene names and chromosomes are also given in Tables 4, 6, 8 and 10. In Tables 5, 7 and 9 cpg IDs—representing methylation locations (representing either a nucleobase or a paired nucleobase)—are given with regard to the Infinium HumanMethylation450K database, and chromosomes and positions (pos) are given with regard to the reference genome.
The markers in Table 4 represent genomic regions with 10 kb up/downstream of genes that show statistically significant, particularly the statistically most significant, validated dysmethylation in peripheral blood, particularly in independent discovery (41 DCM and 31 CTRL) and verification cohorts (9 DCM and 28 CTRL). (DCM=dilated cardiomyopathy; CTRL=control)
The markers in Table 5 represent distinct cpg IDs and genomic positions (particularly top 10) that show statistically significant, particularly the statistically most significant, validated dysmethylation in peripheral blood, particularly in independent discovery (41 DCM and 31 CTRL) and verification cohorts (9 DCM and 28 CTRL).
The markers in Table 6 represent genomic regions with 10 kb up/downstream of genes that show validated dysmethylation in peripheral blood, particularly in independent discovery (41 DCM and 31 CTRL) and verification cohorts (9 DCM and 28 CTRL) with an area under the curve (AUC) of more than 85% in the discovery and verification cohorts.
The markers in Table 7 represent distinct cpg IDs and genomic positions that show validated dysmethylation in peripheral blood, particularly in independent discovery (41 DCM and 31 CTRL) and verification cohorts (9 DCM and 28 CTRL) with an area under the curve (AUC) of more than 85% in the discovery and verification cohorts.
The markers in Table 8 represent genomic regions with 10 kb up/downstream of genes that show validated dysmethylation in peripheral blood, particularly in independent discovery (41 DCM and 31 CTRL) and verification cohorts (9 DCM and 28 CTRL) with an area under the curve (AUC) of more than 80% in the discovery and verification cohorts.
The markers in Table 9 represent distinct cpg IDs and genomic positions that show validated dysmethylation in peripheral blood, particularly in independent discovery (41 DCM and 31 CTRL) and verification cohorts (9 DCM and 28 CTRL) with an area under the curve (AUC) of more than 80% in the discovery and verification cohorts.
The markers in Table 10 represent markers that show dysmethylation in HF/DCM in peripheral blood and myocardial tissue and are associated with RNA expression levels and represent the genes NPPA and NPPB. The ANF and BNP loci encode atrial natriuretic factor (ANF) and brain natriuretic peptide (BNP), and the latter represents the present gold-standard biomarker for heart failure. The inventors found the same direction of dysmethylation in DNA, as also shown in
According to certain embodiments, the presence of a plurality of markers is determined, so that the risk of heart failure and/or dilated cardiomyopathy can be determined more accurately.
A further aspect of the present invention is directed to the use of the markers in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, preferably Table 1a, Table 2a, Table 3a, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, particularly preferably Table 1b, Table 2b, Table 3b, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, e.g. Table 1a, Table 2a and/or Table 3a, e.g. Table 1b, Table 2b and/or Table 3b, as a marker for heart failure and/or dilated cardiomyopathy in a patient.
Furthermore disclosed is a data bank comprising the markers disclosed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, preferably Table 1a, Table 2a, Table 3a, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, particularly preferably Table 1b, Table 2b and/or Table 3b, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, e.g. Table 1a, Table 2a and/or Table 3a, e.g. Table 1b, Table 2b and/or Table 3b.
According to certain embodiments, the data bank can be at a remote location and can be queried from a local client.
The present data banks can be used in a variety of applications. For example, the data bank can then be used, according to an aspect of the invention, in a method of determining a risk for heart failure and/or dilated cardiomyopathy in a patient.
Also disclosed is a data bank comprising markers obtained by the first and/or second aspect of the invention.
In addition, the present invention relates in a further aspect to a method of determining a risk for a disease in a patient, comprising
obtaining or providing data of an epigenomics profile and/or a transcriptome of at least one sample of the peripheral blood and/or a diseased tissue of the patient, and
determining the presence of at least one marker as determined by the method of the first and/or second aspect.
According to certain embodiments, the disease is heart failure (HF) and/or dilated cardiomyopathy (DCM), and the at least one marker as determined by the method of first and/or second aspect is at least a marker disclosed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, preferably Table 1a, Table 2a, Table 3a, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, particularly preferably Table 1b, Table 2b, Table 3b, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, e.g. Table 1a, Table 2a and/or Table 3a, e.g. Table 1b, Table 2b and/or Table 3b.
In a still further aspect the present invention relates to a computer program product comprising computer executable instructions which, when executed, perform a method of determining a risk for a disease in a patient.
In certain embodiments the computer program product is one on which program commands or program codes of a computer program for executing said method are stored. According to certain embodiments the computer program product is a storage medium.
The present invention also relates to the use of the computer program product in a method of determining a risk for a disease in a patient.
Further disclosed is a method of prognosis and/or for monitoring and/or assisting in drug-based therapy of patients diagnosed with heart failure and/or dilated cardiomyopathy, wherein a marker as disclosed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, preferably Table 1a, Table 2a, Table 3a, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, particularly preferably Table 1b, Table 2b, Table 3b, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, e.g. Table 1a, Table 2a and/or Table 3a, e.g. Table 1b, Table 2b and/or Table 3b, is used. The markers disclosed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, preferably Table 1a, Table 2a, Table 3a, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, particularly preferably Table 1b, Table 2b, Table 3b, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and/or Table 10, e.g. Table 1a, Table 2a and/or Table 3a, e.g. Table 1b, Table 2b and/or Table 3b, allow a prognosis of the course of the disease as well as a monitoring thereof and can assist in deriving a conclusion regarding the medication prescription, etc., during the therapeutic treatment thereof.
The present invention will now be described in detail with reference to several examples thereof. However, these examples are illustrative and do not limit the scope of the invention.
Some clinical perspectives are briefly discussed with regard to the Examples.
Clinical Perspective
1) What is new?
The application shows that Multi-omics studies allow detection of functional patterns in cardiovascular disease.
Epigenetic patterns are associated with heart failure due to dilated cardiomyopathy. The multi-omics studies design furthermore allowed detection of connected functional layers in cardiovascular disease.
DNA methylation of distinct genomic regions is conserved between heart tissue and peripheral blood. DNA methylation could represent a new class of heart failure biomarkers.
Transcriptional Regulation of natriuretic factors ANP and BNP is associated with conserved DNA methylation.
2) What are the clinical implications?
Epigenetic mechanisms are involved in chronic heart failure, which opens new perspectives for translational research. Their investigation as diagnostic, predictive of prognostic biomarkers and future drug targets needs further attention.
The present study has been approved by the ethics committee, medical faculty of Heidelberg University. All participants have given written informed consent. The diagnosis of non-ischemic Dilated Cardiomyopathy (DCM) was confirmed by excluding relevant coronary artery disease (CAD) as determined by coronary angiography. Valvular heart disease was excluded by cMRI and/or echocardiography and myocarditis/inflammatory DCM by histopathology. Patients with history of uncontrolled hypertension, myocarditis, regular alcohol consumption or cardio-toxic chemotherapy were also excluded. To include the whole continuum of systolic heart failure, also early disease stages (EF<55%) who were symptomatic (dyspnoe, edema/congestion) were included.
After screening of n=135 DCM patients, n=38 met all inclusion and exclusion criteria and had sufficient amounts of high quality left ventricular biopsies (LV free wall) and peripheral blood samples available for high-throughput analyses. Control LV-biopsy specimens were obtained from patients after heart transplantation (n=31) that was at least 6 months ago, who had normal systolic and diastolic function and no evidence for relevant vasculopathy or acute/chronic organ rejection as judged by coronary angiography and immuno-histopathology. Additional gender- and age-matched controls for whole blood samples (n=31) had normal systolic and diastolic left ventricular function without evidence for other cardiovascular disease.
Additionally for further validation purposes, left ventricular myocardium of n=11 DCM patients who underwent heart transplantation and left ventricular myocardium (n=5) from previously healthy road accident victims were included.
In the mean, patients were 54 years old and disease onset was 11 months prior to inclusion. Detailed basic and clinical characteristics of DCM patients are summarized in the following Table 11.
Biopsy specimens were obtained from the apical part of the free left ventricular wall (LV) from DCM patients or cardiac transplant patients (controls) undergoing cardiac catheterization using a standardized protocol. Biopsies were immediately washed in ice-cold saline (0.9% NaCl) and immediately transferred and stored in liquid nitrogen until DNA or RNA was extracted. After diagnostic workup of the biopsies (histopathology), remaining material was evenly dissected to isolate DNA and RNA. DNA was isolated from biopsies and peripheral blood using Qiagen DNA Blood Maxi Kit. Total RNA was extracted using the RNeasy kit according to the manufacturer's protocol (Qiagen, Germany) from biopsies and peripheral blood. RNA purity and concentration were determined using the Bioanalyzer 2100 (Agilent Technologies, Berkshire, UK) with a Eukaryote Total RNA Pico assay chip.
Methylation profiles were measured using the Illumina 450 k methylation assay, following procedures as described in Bibikova, M., et al.: High density DNA methylation array with single CpG site resolution, Genomics, 2011, 98(4): p. 288-95. From each patient, we subjected 200 ng DNA (blood) and 200 ng DNA (biopsy) to the measurements.
Methylation sites with a detection p-value of >0.05 in more than 10% of the samples were removed from analysis. Methylation levels with a detection p-value of >0.05 in less than 10% of the samples were imputed via knn-imputation, as described in Hastie T, T., R, Narasimhan, B Chu, G, impute: impute: Imputation for microarray data, R package version 1.46.0, 2016. To reduce the effects of genomic variation on methylation measurements we excluded all methylation sites where we found variants in more than 10% of the DCM patients of the discovery cohort within the 50 bp probe region by whole genome sequencing. Methylation levels with variants in less than 10% of the DCM patients were imputed. We further removed all probes on X and Y chromosomes as well as probes which have been identified by Chen et al. (Chen, Y. A., et al., Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics, 2013. 8(2): p. 203-9) to cross-hybridize with non-targeted DNA, yielding 394,247 methylation sites that passed QC. It should be noted that the predictive performance may even be increased when e.g. switching from the employed high-throughput Infinium HumanMethylation450 BeadChip screening array to a targeted analysis approach for single methylation sites.
1 μg of total gDNA (genomic DNA) was sheared using the Covaris™ 5220 system, applying 2 treatments of 60 seconds each (peak power=140; duty factor=10) with 200 cycles/burst. 500 ng of sheared gDNA was taken and whole genome libraries were prepared using TruSeq DNA sample preparation kit according to manufacturer's protocols (Illumina, San Diego, US). Sequencing was performed on an IlluminaHiSeq 2000, using TruSeq SBS Kit v3 and reading two times 100 bp for paired end sequencing, on four lanes of a sequencing flowcell.
Demultiplexing of the raw sequencing reads and generation of the fastq files was done using CASAVA v.1.82. The raw reads were then mapped to the human reference genome (GRCh37/hg19, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/) with the burrows-wheeler alignment tool (BWA v.0.7.5a) and duplicate reads were marked (Picard-tools 1.56) (http://picard.sourceforge.net/). Next, we used the Genome-Analysis-Toolkit according to the recommended protocols for variant recalibration (v. 2.8-1-g932cd3a) and variant calling (v.3.3-0-g37228af) as described in the respective best-practices guidelines (https://www.broadinstitute.org/gatk/guide/best-practices), as described in DePristo, M. A., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data, Nat Genet, 2011, 43(5): p. 491-8.
To remove unwanted technical variation, we applied a modified danes normalization procedure across all methylation measurements. Danes normalization is part of the wateRmelon package. The normalization procedure is based on between-array quantile normalization of methylated and unmethylated raw signal intensities of red and green channels together and thus also accounts for dye bias. However, between-array quantile normalization as initially developed for gene expression data is controversial for methylation data as overall methylation distributions may differ strongly between samples, tissues and diseases states. Consequently, we modified the danes normalization approach by not applying quantile normalization for between-array normalization but cyclicloess normalization instead. Cyclicloess normalization is similar in effect and intention to quantile normalization, but with the advantage that it does not as drastically normalize extreme cases and still preserves major distributional differences.
To account for batch effects, we performed duplicate measurements on different chips of in total 8 samples and used the duplicates for bridging the methylation-values of different analysis batches based on the duplicates only using the removeBatchEffect function from the limma package, as described in Ritchie, M. E., et al., limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Res, 2015, 43(7): p. e47. Following batch bridging, duplicate measurements were averaged before downstream statistical analysis.
Deregulated methylation sites were identified by linear modelling and moderated t-tests including age and gender using the limma package, as described above.
To also correct for potential genomic inflation in the discovery cohort, we performed principal component analysis on methylation measurements and identified principal components (PCs) which were associated with known confounders (e.g. technical such as analysis date and biological confounders) at FDR (false discovery rate)<=0.05. Again, deregulated methylation sites were subsequently identified by linear modelling and moderated t-tests including age and gender all identified PCs as covariates using the limma package. Statistical analyses were carried out in R-3.2.2. FDR correction of significance levels was performed using the Benjamini-Hochberg procedure.
RNAseq libraries were generated using TrueSeq RNA Sample Prep Kit (Illumina), and sequencing was performed 2×75 bp on a HiSeq2000 (Illumina) sequencer. Unstranded paired end raw read files were mapped with STAR v2.4.1c using GRCh37/hg19 and the Gencode 19 gene model (http://www.gencodegenes.org/). Only uniquely mapped reads were counted into genes using subread's featureCounts program (subread version 1.4.6.p1). Prior to statistical analyses, genes with very low expression levels (average reads <=1, detected reads in less than 50% of the samples) were removed. Count data was normalized by r log normalization as described in Love, M. I., W. Huber, and S. Anders: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol, 2014, 15(12): p. 550, which is an improved method of the variance stabilization transformation as recommended for eQTL (expression quantitative trait loci) by the original MatrixEQTL publication of Shabalin, A. A.: Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 2012. 28(10): p. 1353-8.
An eQTL analysis between methylation sites and gene expressions was performed on 34 DCM patients and 25 controls for which high quality transcriptome data from biopsy samples could be obtained (out of the total of 38 DCM patients and 31 controls which were profiled on the methylation level). MatrixEQTL and linear models were used to correlate the expression profiles of 19,418 genes with the 311,222 methylation sites in a range of 10.000 bp up- and downstream of the genes as well as in the gene body region out of the 394,247 that passed quality control. Association with the RNA expression level was carried out using the myocard samples.
DNA methylation of the gene body as well as adjacent non-coding regulatory regions is known to be an important regulation mechanism for gene expression. For aggregated analyses on region level, aggregate significance level was then obtained using the simes procedure for all methylation loci as the simes procedure has been shown to generally perform well, also for correlated significance levels, as described in RØDLAND, E. A.: Simes' procedure is ‘valid on average’, Biometrika, 93: p. 742-746. To determine the distance for significant associations between DNA methylation and RNA expression, an aggregate significance level for associations was obtained using the simes procedure for all methylation loci within the gene body and adjacent regions at increased distances, as the simes procedure has been shown to generally perform well as an aggregate measure for significant associations, also for correlated significance levels. The results thereof are shown in
As shown in
From the discovery cohort first four different categories of biomarkers (Cat. 1-4) were identified which show concordant dysregulation in methylation profiles in DCM either across molecular levels (i.e. epigenetic and transcriptomic; Cat. 4), tissues (i.e. cardiac tissue and blood; Cat. 2 and 3) or even both (Cat. 1).
The following categories (Cat. 1-4) describe molecular marker of HF and DCM.
Cat. 1a describes genomic regions that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue and are associated with mRNA expression levels of genes of cardiac relevance in the myocard which are deregulated in HF/DCM. The genes are given in Table 12.
Cat. 1b describes genomic regions that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue and are associated with mRNA expression levels of genes of unknown cardiac relevance in the myocard which are deregulated in HF/DCM. The genes are given in Table 13.
Cat. 2 describes genomic regions that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue and cluster in chromosome bands with heart specific genes. The genes are given in Table 14.
Cat. 3 describes genomic regions that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue but do not fall within Cat. 1 or 2. Two sub-categories were identified.
Cat. 3a is related to genomic regions in genes with cardiac relevance. The genes are given in Table 15.
Cat. 3b is related to genomic regions in genes with unknown cardiac relevance. The genes are given in Table 16.
Cat. 4 describes genomic regions that show correlated, deregulated methylation and mRNA expression patterns in HF/DCM in the myocardial tissue. The genes are given in Table 17.
Further, the following categories (Ca. 5-7) describe molecular marker of HF and DCM that were further identified.
Cat. 5 describes genomic regions that show coordinated hyper/hypo methylation in HF/DCM in peripheral blood and myocardial tissue and are associated with mRNA expression levels in the myocard. The genes are given in Table 18.
Cat. 6 describes genomic regions that show coordinated methylation and gene expression changes in HF/DCM in the myocardial tissue and are also associated with HF/DCM on gene level. The genes are given in Table 19.
Cat. 7 describes genomic regions that show coordinated methylation and gene expression changes in HF/DCM in the myocardial tissue. The genes are given in Table 20.
Methods and Results (summary): Infinium HumanMethylation450 was used for high-density epigenome wide mapping of DNA methylation in left ventricular biopsies and whole peripheral blood of living probands. RNA deep sequencing was performed on the same samples in parallel. Whole genome sequencing of all patients allowed exclusion of promiscuous genotype-induced methylation calls. In the screening stage, we detected 59 epigenetic loci that are significantly associated with DCM (FDR corrected p≤0.05), with three of them reaching epigenome-wide significance at p≤5×10-8. Twenty-seven (46%) of these loci could be replicated in independent cohorts, underlining the role of epigenetic regulation of key cardiac transcription regulators. Using a staged multi-omics study design, we link a subset of 517 epigenetic loci with DCM and cardiac gene expression. Furthermore, we identified distinct epigenetic methylation patterns that are conserved across tissues, rendering these CpGs novel epigenetic biomarkers for heart failure.
Patient inclusion for the present study was approved by the ethics committee, medical faculty of Heidelberg University. All participants have given written informed consent to allow for molecular analysis of blood and left-over tissue. The diagnosis of Dilated Cardiomyopathy (DCM) was confirmed after excluding coronary artery disease (CAD) as determined by coronary angiography, valvular heart disease was excluded by cMRI and echocardiography and myocarditis/inflammatory DCM by histopathology (Richardson P, et al., Report of the 1995 World Health Organization/International Society and Federation of Cardiology Task Force on the Definition and Classification of cardiomyopathies. Circulation. 1996; 93:841-2). Patients with history of uncontrolled hypertension, myocarditis, regular alcohol consumption, illicit drugs or cardio-toxic chemotherapy were also excluded. To include the clinical continuum of systolic heart failure, also early but symptomatic disease stages (LV-EF between >45 and <55%) were included.
After screening of n=135 DCM patients, n=41 met all inclusion and no exclusion criteria and had sufficient amounts of left-over LV ventricular biopsies (LV free wall) and peripheral blood samples available for the laborious high-throughput analyses of DNA methylation, genome- and mRNA sequencing. Control LV-biopsy specimens were obtained from stable and symptom-free patients after heart transplantation (n=31; HTX was at least 6 months ago), who had normal systolic and diastolic function and no evidence for relevant vasculopathy or acute/chronic organ rejection as judged by coronary angiography and immuno-histopathology. Controls for whole blood samples (n=31) had a cardiovascular risk profile (Hypertension, Hyperlipidemia), but completely normal systolic and diastolic left ventricular function without evidence for heart failure or significant (>50%) coronary artery disease.
As an independent validation cohort, left ventricular myocardium of n=18 DCM patients and n=8 previously healthy road accident victims were included. The independent validation cohort for peripheral blood consisted of n=9 DCM patients and n=28 clinical controls. A third replication cohort for top blood-based markers included n=82 DCM patients (Institute for Cardiomyopathies Heidelberg) and n=109 Controls (Noko/normal control project).
Biopsy specimens were obtained from the apical part of the free left ventricular wall (LV) from DCM patients or cardiac transplant patients (controls) undergoing cardiac catheterization using a standardized protocol. Biopsies were immediately washed in ice-cold saline (0.9% NaCl) and transferred and stored in liquid nitrogen until DNA and RNA was extracted. After diagnostic workup of the biopsies (histopathology), remaining material was evenly dissected to isolate DNA and RNA. DNA was extracted from blood with DNA Blood Maxi Kit (Qiagen) and from biopsies with Allprep Kit (Qiagen). Total RNA was extracted using the miRNeasy mini Kit (blood) or Allprep Kit (biopsies) according to the manufacturer's protocol (Qiagen, Germany) from biopsies and peripheral blood. RNA purity and concentration were determined using the Bioanalyzer 2100 (Agilent Technologies, Berkshire, UK) with a Eukaryote Total RNA Pico assay for RNA from biopsies and with Eukaryote Total RNA Nano assay for RNA from blood.
Methylation profiles were measured using the Illumina 450 k methylation assay, following procedures as described earlier (Bibikova M, et al., High density DNA methylation array with single CpG site resolution. Genomics. 2011; 98:288-95). From each patient, we subjected 200 ng DNA (blood and biopsy) for the measurements. Methylation sites with a detection p-value of >0.05 in more than 10% of the samples were removed from analysis. Methylation levels with a detection p-value of >0.05 in less than 10% of the samples were imputed via knn-imputation (Hastie T T, R, Narasimhan, B Chu, G. impute: impute: Imputation for microarray data. R package version 1460. 2016). To reduce the effects of genomic variation on methylation measurements, we excluded methylation sites that were potentially influenced by genotypes present in more than 10% of the DCM patients and that lie within the 50 bp probe region as assessed by whole-genome sequencing. Methylation levels with variants in less than 10% of the DCM patients were imputed. We further removed all probes on X and Y chromosomes as well as probes that have been identified by Chen et al. to cross-hybridize with non-targeted DNA (Chen Y A, et al., Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013; 8:203-9). Finally, 394,247 methylation sites passed QC.
DNA methylation was validated for the top two biomarker candidate loci by the MassARRAY technique as previously described (Haas J, et al., Alterations in cardiac DNA methylation in human dilated cardiomyopathy. EMBO Mol Med. 2013; 5:413-29). Briefly, 400 ng genomic DNA was chemically modified with sodium bisulfite. The bisulfite-treated DNA was PCR-amplified by primers designed to cover the Infinium probes cg06688621 and cg01642653 (cg06688621 primer sequences GGTGTTTTTTGTTTAGTATTTTTTAGAG and AGGGTAGATTTGAGGTAGTTTAGGA; cg01642653 primer sequences TAGGTGTTTTTTAGGGTTGTTTTTT and GTTGGGGAATTTGTTGTTTATTAG). The amplicons were transcribed by T7 polymerase, followed by T-specific-RNAase-A cleavage. The digested fragments were quantified by MALDI-TOF-based technique (MassARRAY).
1 μg of total peripheral blood gDNA was sheared using the Covaris™ 5220 system, applying 2 treatments of 60 seconds each (peak power=140; duty factor=10) with 200 cycles/burst. 500 ng of sheared gDNA was taken and whole genome libraries were prepared using TruSeq DNA sample preparation kit according to manufacturer's protocols (Illumina, San Diego, US). Sequencing was performed on an Illumina HiSeq 2000, using TruSeq SBS Kit v3 and reading two times 100 bp for paired end sequencing, on four lanes of a sequencing flowcell.
Demultiplexing of the raw sequencing reads and generation of the fastq files was done using CASAVA v.1.82. The raw reads were then mapped to the human reference genome (GRCh37/hg19) with the burrows-wheeler alignment tool (BWA v.0.7.5a) (Li H and Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754-60) and duplicate reads were marked (Picard-tools 1.56) (http://picard.sourceforge.net/). Next, we used the Genome-Analysis-Toolkit according to the recommended protocols for variant recalibration (v. 2.8-1-g932cd3a) and variant calling (v.3.3-0-g37228af) as described in the respective best-practices guidelines (https://www.broadinstitute.org/gatk/guide/best-practices) (DePristo M A, et al., A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics. 2011; 43:491-8).
Regarding detailed information on normalization and removal of technical and batch effects, association statistics, overrepresentation and gene ontology analyses, the following is applied.
To remove unwanted technical variation, we applied a modified danes normalization procedure across all methylation measurements. Danes normalization is part of the wateRmelon package and was first described by Pidsley (Pidsley R, et al., A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013; 14:293). The normalization procedure is based on between-array quantile normalization of methylated and unmethylated raw signal intensities of red and green channels together and thus accounts for dye bias. However, between-array quantile normalization as initially developed for gene expression data is controversial for methylation data as overall methylation distributions may differ strongly between samples, tissues and diseases states. Consequently, we modified the danes normalization approach by not applying quantile normalization for between-array normalization but cyclicloess normalization instead. Cyclicloess normalization is similar in effect and intention to quantile normalization, but with the advantage that it does not as drastically normalize extreme cases and still preserves major distributional differences (Ballman K V, Grill D E, Oberg A L and Therneau T M. Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics. 2004; 20:2778-86).
All samples were measured in 5 different batches and each batch contained duplicate samples from other batches. To remove technical variation possibly introduced by the measurement batch, the duplicate measurements of in total 8 samples were used for bridging the methylation-values (Du P, et al., Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010; 11:587) of different analysis batches using the removeBatchEffect function from the limma package (Ritchie M E, Phipson B, Wu D, Hu Y, Law C W, Shi W and Smyth G K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47). Following batch bridging, duplicate measurements were averaged before downstream statistical analysis.
To correct for genomic inflation in the discovery cohort, we performed principal component analysis on methylation measurements and identified principal components (PCs), which were associated with known confounders (e.g. technical such as analysis date and biological confounders such as medication) at FDR≤0.05, see Tables 21 and 22.
Deregulated methylation sites were identified by linear modelling and moderated t-tests including age and gender as well as all identified PCs as covariates using the limma package (Ritchie M E, Phipson B, Wu D, Hu Y, Law C W, Shi W and Smyth G K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47). Methylation sites were subsequently directionally verified in verification cohorts including gender (as age was not available for all samples) as covariates. Statistical analyses were carried out in R-3.2.2 (R: A Language and Environment for Statistical Computing [computer program]. 2008). FDR correction of significance levels was performed using the Benjamini-Hochberg procedure (Benjamini Y and Hochberg Y. Controlling the False Discovery Rate—a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met. 1995; 57:289-300). Significance levels from discovery and verification cohorts were combined using Fisher's method to combine results from independent tests.
RNA sequencing libraries were generated using TrueSeq RNA Sample Prep Kit (Illumina) and sequencing was performed 2×75 bp on a HiSeq2000 (Illumina) sequencer. Samples were sequenced to a median paired-end read count of 29.85 million. Unstranded paired-end raw read files were mapped with STAR v2.4.1c (Dobin A and Gingeras T R. Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics. 2015; 51:11 14 1-11 14 19) using GRCh37/hg19 and the Gencode 19 gene model (http://www.gencodegenes.org/). Only uniquely mapped reads were counted into genes using subread's feature counts program (Liao Y, Smyth G K and Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014; 30:923-30) (subread version 1.4.6.p1) and mapping percentages were median 88.08. Prior to statistical analyses, genes with very low expression levels (average reads <=1, detected reads in less than 50% of the samples) were removed. Count data was normalized by r log normalization (Love M I, Huber W and Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550), which is an improved method of the variance stabilization transformation (Anders S and Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010; 11:R106) as recommended for eQTL by the original MatrixEQTL publication (Shabalin A A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; 28:1353-8).
An eQTL analysis between methylation sites and gene expressions was performed on the 34 DCM patients and 25 controls with high quality epigenome and transcriptome data from the same biopsy samples. MatrixEQTL (Shabalin A A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; 28:1353-8) and linear models were used to correlate the expression profiles of 19,418 genes with the 311,222 methylation sites in a range of 10,000 bp up- and downstream of the genes as well as in the gene body region. Epigenome-transcriptome associations were subsequently directionally verified in the cardiac tissue verification cohort.
To identify an epigenetic signature for DCM we filtered for methylation loci, which were associated with the disease and gene expression in myocardial discovery and verification cohort at an uncorrected significance level of p≤0.05. Conserved methylation differences in DCM across myocardial tissue and peripheral blood were identified by filtering for methylation loci that additionally showed conservation across tissues (kendall rank test for direct correlation p≤0.05) and deregulated methylation status in identical directions (directional p≤0.05). To minimize the effect of blood cell heterogeneity, we excluded all sites which have been shown to be associated with blood cell heterogeneity at a (Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979; 6, 65-70) corrected F-statistics significance level p≤0.05 by Jaffe et al. (Jaffe A E and Irizarry R A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014; 15:R31). Finally, predictive DCM models were built for myocardial tissue and peripheral blood separately using the glm function of the R stats package based on logistic regression models and 5-fold cross-validation with 10 repeats in the discovery cohort and subsequently tested in the verification cohorts.
For aggregated analyses on gene or multi-gene level, aggregate significance level was then obtained using the simes procedure for all methylation loci (RØDLAND E A. Simes' procedure is ‘valid on average’. Biometrika. 93:742-746).
Overrepresentation analyses for deregulated methylation sites in chromosome bands, discovery and verification cohorts as well as for methylation sites associated with disease state and gene expression was based on the fisher exact test on 2×2 contingency tables using a threshold of p≤0.05.
Identification of overrepresented GO terms was performed using the gometh function of the missMethyl package (Phipson B, Maksimovic J and Oshlack A. missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform. Bioinformatics. 2016; 32:286-8), taking into account the probability of differential methylation based on the number of probes on the 450 k array per gene. This is particularly important, since severe bias when performing gene set analysis for genome-wide methylation data due to the differing numbers of methylation sites profiled for each gene has been reported (Geeleher P, Hartnett L, Egan L J, Golden A, Raja Ali R A and Seoighe C. Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics. 2013; 29:1851-7). The applied approach models and compensates the effect of selection bias using the methodological framework originally developed by Young et al. (Young M D, Wakefield M J, Smyth G K and Oshlack A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 2010; 11:R14).
Further data regarding the analysis carried out in Example 2 and results obtained therein are found in the following Tables 23 to 34.
81 (74.3%)
For the inclusion in this study, it was required that patients with systolic dysfunction and suspicion for DCM underwent extensive clinical phenotyping. Excluded were all patients who had hints for secondary causes of DCM from the detailed clinical work-up (see Materials and Methods section). A total of n=135 patients were included in the study. Since we only were interested in complete datasets and sufficient cardiac biomaterial as left-over, we excluded 94 individuals. In the final core cohort, n=41 patients for whom we were able to generate high quality DNA methylation profiles from heart tissue and peripheral blood were used in the screening stage of this study. None of these patients or controls did overlap with previous studies on DNA methylation (Haas J, et al., Alterations in cardiac DNA methylation in human dilated cardiomyopathy. EMBO Mol Med. 2013; 5:413-29). The mean age of patients was 54.1±12.3 and 63% were in early NYHA stages. As such, the median NT-proBNP was 812 ng/l, see Table 25. As control samples, we used left-ventricular biopsies from 31 patients free of heart failure with regular systolic and diastolic heart function who underwent routine left-heart myocardial biopsy after receiving heart transplantation, see Table 26. For an overview on patients, controls and molecular phenotyping, please see
After performing data quality control and normalization, we calculated genome-wide associations for each CpG site. Genomes were prima vista excluded from the analysis. To adjust for potential epigenomic inflation, we performed principal component (PC) analysis on methylation measurements and identified PCs, which were associated with confounders (methodological confounders as batch effects and biological confounders such as medication; FDR 0.05), see Tables 21 and 22. Dysregulated methylation sites were identified by linear modelling and moderated t-tests including age, gender as well as the identified principal components as covariates (Meder B, et al., Influence of the confounding factors age and sex on microRNA profiles from peripheral blood. Clin Chem. 2014; 60:1200-8).
From 485,000 methylation sites, 394,247 passed QC in myocardial tissue and blood. Genotype-associated methylation changes were excluded. 42,745 CpG-sites (9.5%) were found differentially methylated (raw-p≤0.05) in left-ventricle myocardium when comparing DCM vs. controls (33,396 of them being in 10 kb windows around annotated genes with expression in the cardiac tissue). The ratio of hypo-methylated vs hypermethylated CpG sites was 0.92. In blood samples, 35,566 (9%) were associated with DCM (raw p≤0,05; 28,153 being in a 10 kb window of annotated genes).
As summarized in the Manhattan plot in
To replicate these findings, we epigenotyped DNA from n=18 independent DCM patients and n=8 previously healthy control individuals that were casualties of roadside accidents. To the best of our knowledge, these control individuals were free of any heart condition and did not take regular medication. As shown in Table 35, we could successfully replicate 27 of the 59 loci (46%) in the independent cohorts. The most significant hit from the screening stage (cg16318181) could also be validated (replication p=0.004), resulting in a combined Fisher's p=2.23×10-09. In total, 5 hits superseded stringent genome-wide significance in the combined analysis.
In previous studies, mainly low-resolution approaches or very small cohorts were used to identify DNA methylation patterns for DCM and/or heart failure. Hence, to see if these findings can be reproduced in the current study, we compared methylation changes from the available previous studies (34 loci) and the current dataset. Since the methods varied largely and CpGs were not uniformly measured in the former studies, we used simes p-value aggregation of our dataset for the loci described previously. Using a cutoff of p≤0.05, we could replicate DNA methylation changes in the same direction in the genes LY75, PTGES, CTNNAL1, TNFSF14, MRPL16, KIF17, see Table 36 (Haas J, et al., Alterations in cardiac DNA methylation in human dilated cardiomyopathy. EMBO Mol Med. 2013; 5:413-29; Koczor C A, et al., Thymidine kinase and mtDNA depletion in human cardiomyopathy: epigenetic and translational evidence for energy starvation. Physiol Genomics. 2013; 45:590-6; Movassagh M, et al., Differential DNA methylation correlates with differential expression of angiogenic factors in human heart failure. PLoS One. 2010; 5:e8564; Garagnani P, et al., Methylation of ELOVL2 gene as a new epigenetic marker of age. Aging Cell. 2012; 11:1132-4), which supports the fact that heart failure is associated with certain defined, robust DNA methylation patterns. From all replicated loci, the LY75 methylation pattern showed the highest significance (simes p=0.002).
Besides confirming hypermethylation of the LY75 gene locus, we also replicated the associated downregulation of LY75 expression levels in DCM, as seen in
As for the successful replication of previous findings in tissue, we successfully replicated known age-dependent patterns in CpG islands within ELOVL2, FHL2 and PENK (Garagnani et al., 2012) in the DNA derived from whole peripheral blood samples of our cohort (simes significance level <10-14).
In unsupervised cluster analysis, showing DNA methylation in cardiac tissue—as seen in
To test for possible functional methylation patterns, we first performed overrepresentation analysis for genome-wide transcription- and enhancer factor binding sites (Mathelier A, et al., JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2016; 44:D110-5) and their potential affection by DNA methylation. From 158,979 CpGs within annotated sequence motifs, we detected 4 motifs significantly associated with methylation alterations in DCM (FDR-p≤0.05), as shown in Table 23. Of interest, three of the motif-binding factors (Smad2, Smad4 and Bmal1) are known to be involved in cardiac remodeling during DCM and heart failure (Lefta M, Campbell K S, Feng H Z, Jin J P and Esser K A. Development of dilated cardiomyopathy in Bmal1-deficient mice. Am J Physiol Heart Circ Physiol. 2012; 303:H475-85).
There is ample evidence that larger stretches of DNA methylation cluster together and exhibit repression of cis-regulatory elements. Hence, we carried out an overrepresentation analysis for clustering of differentially methylated sites at raw-p≤0.05 in specific chromosomal bands and found 6 regions to be significantly differentially methylated in DCM (Bonferroni level p≤0.05), as seen in
These regions host noticeable numbers of genes associated with cardiac development, heart function and cardiomyopathy. As an example, we found the gene locus 12q24.21 to be differentially methylated in DCM (78 out of 425 methylation sites show association with DCM at raw-p≤0.05, fisher's exact p=2×10-6). The 12q24.21 locus is harbouring several genes that have previously been linked to cardiomyopathies or cardiac development. One of the genes is TBX5, coding for a protein that is part of the T-Box family, known to be implicated in embryonic development and cardiogenesis (Papaioannou V E. The T-box gene family: emerging roles in development, stem cells and cancer. Development. 2014; 141:3819-33). Mutations in TBX5 could lately been shown in patients suffering from familial, as well as sporadic dilated cardiomyopathy (Zhou W, Zhao L, Jiang J Q, Jiang W F, Yang Y Q and Qiu X B. A novel TBX5 loss-of-function mutation associated with sporadic dilated cardiomyopathy. Int J Mol Med. 2015; 36:282-8). Another gene within this locus is MED13L, which is part of the Mediator complex family, which is also known to be involved in cardiovascular disease (Schiano C, Casamassimi A, Vietri M T, Rienzo M and Napoli C. The roles of mediator complex in cardiovascular diseases. Biochim Biophys Acta. 2014; 1839:444-51) and early heart development, leading to a variety of inborn cardiac abnormalities when disturbed (Samanek M. Congenital heart malformations: prevalence, severity, survival, and quality of life. Cardiol Young. 2000; 10:179-85). Additionally, we find the MYL2 gene within close vicinity to the 12q24.21 locus, which is coding for the ventricular regulatory Myosin Light Chain. It has an essential role during early embryonic cardiac development and represents one of the earliest markers of ventricular specification. Mutations in MYL2 are furthermore associated with Dilated and Hypertrophic Cardiomyopathy (O'Brien T X, Lee K J and Chien K R. Positional specification of ventricular myosin light chain 2 expression in the primitive murine heart tube. Proc Natl Acad Sci USA. 1993; 90:5157-61). Together, we found evidence for coordinated DNA methylation patterning in key cardiac developmental genomic regions.
To test if the observed alterations in the degree of DNA methylation also act on global gene expression, we performed poly-A enriched mRNA sequencing in isolated RNA from the same biopsies that were taken for the methylation analysis in our discovery cohort. To link expression and DNA methylation, we performed meteQTL-analysis and identified a wide range of DNA methylation sites acting on cardiac transcription across the entire genome, as shown in
DNA hypermethylation within in the promoter region and the vicinity of transcription start-sites was found to be strongly associated with transcriptional downregulation and hypomethylation with upregulation. For 3′ downstream regions as well as towards the end of the gene body we find an equal ratio of positive and negative correlation between methylation status and gene expression levels, as seen in
From the 33,396 CpG-sites found to be differentially methylated (raw-p≤0.05) in DCM and within 10 kb of genes expressed in the cardiac tissue, 8,420 CpGs were also significantly associated with gene expression in the discovery cohort (raw-p≤0.05). The observed overlap between DNA methylation and mRNA abundancy is far higher than expected by chance (Fisher exact p=7×10-67), which indicates that DNA methylation has a considerably strong functional impact on gene transcription in the heart.
To dissect the role of these changes during DCM and also take into account the most valid candidates, we performed an independent validation study. The controls of the validation cohort, which were casualties of road accidents, were to the best of our knowledge free of any heart condition and did not take medication. To not only eliminate potential biological confounders, we chose a different mRNA sequencing protocol using random primers instead of poly-A enrichment. Samples were sequenced to a median paired-end read count of 37.17 million and mapping percentages were in the median 88.09. By combining these two independent study cohorts, we could generate a set of high confidence DNA methylation and expression sites for DCM. In detail, 517 different CpGs were directionally replicated on two levels (Fisher exact p=1.2×10-134), (i) to be associated with DCM and (ii) to act on mRNA transcription, as can be seen from
As shown by gene ontology overrepresentation analysis, the host genes of the methylation sites are mostly related to pathways linked to cardiac development and muscle function, as also shown in Table 24, further indicating that coordination of the expression of important functional genes in the course of (early) heart failure is driven by DNA methylation.
Two of the genome-wide significantly replicated methylation sites (see Table 35) were found to also be associated with expression of neighboring genes in the discovery and verification cohorts. Methylation status of cg25838968 was associated with PLXNA2 expression level (combined p=0.02), which is also differentially expressed in DCM (combined p=3×10-5). Methylation status of cg14523204 is associated with RGS3 (Regulator Of G-Protein Signaling 3) expression (combined p=0.0004), which we found to be differentially expressed in DCM as well (combined p=0.02).
The methylation and expression analyses resolved interesting new loci potentially involved in the pathogenesis of heart failure. As shown above, we for instance could replicate the strong association of myocardial LY75 methylation and expression with DCM. However, LY75 methylation is different in peripheral blood, hampering the immediate use as peripheral blood marker.
Hence, to search for potential peripheral biomarkers, we investigated if DNA methylation changes are conserved across different tissues. As shown by an exploratory analysis there is indeed a set of conserved directionally-dysmethylated regions in heart tissue and blood, as seen in
When using 5% dysmethylation in tissue as a cut-off, we find as many as 3,798 conserved methylation sites that are changed in the same direction in tissue and blood (raw-p≤0.05 in both groups). Very interestingly, the overlapping genes are highly enriched for myofilament components, as seen in the table insets in
Following this interesting hypothesis, we next explored the epigenetic regulation of the NPPA and NPPB locus. This locus encodes atrial natriuretic factor (ANF) and brain natriuretic peptide (BNP), the latter represents the gold-standard biomarker for heart failure. Astoundingly, we find the same direction of dysmethylation in DNA from heart tissue (
In order to embark on the power of connected biological layers captured by the present multistage, multi-omics study design, we then compared the methylation patterns from myocardial tissue and peripheral blood of the screening and replication cohorts after we removed CpG sites that are directly hit by genetic variation (SNP or INDEL within the 50 bp probe region) or are associated with genetic variation within a 10 kb region (α≤0.05). We also removed all CpG sites that have been shown to be associated with blood cell heterogeneity (Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979; 6, 65-70). From 90,935 remaining DNA methylation sites, 17,709 were conserved between cardiac tissue and blood, of which 6 (OR=1.38, fisher's exact p=NS) are associated with DCM in heart tissue and 612 (OR=0.89, fisher's exact p=0.01) had disease association in blood. Three epigenetic loci highly significantly overlapped between tissue and blood (OR=28, fisher's exact p<0.001) on all investigated levels, showing disease association and concordant dysmethylation across tissues.
The resolved genes were “B9 Protein Domain 1” (B9D1, hypomethylated in DCM in heart tissue and blood), “Doublecortin like kinase 2” (DCLK2, hypomethylated) and “Neurotrimin” (NTM, hypermethylated). For Neurotrimin (NTM), which belongs to the so-called IgLONS, there is a reported association of its protein blood levels with heart failure and prognosis of affected patients undergoing pharmacotherapy (Cao T H, et al., Identification of novel biomarkers in plasma for prediction of treatment response in patients with heart failure. Lancet. 2015; 385 Suppl 1:S26). B9D1 (cross-validation median p=4.55×10-6), which is also one of the 517 CpGs, as seen in
Mutations in B9D1 result in disturbed heart development due to disrupted cliogenesis and the protein is highly expressed in myocardium and cardiomyocytes (Dowdle W E, et al., Disruption of a ciliary B9 protein complex causes Meckel syndrome. Am J Hum Genet. 2011; 89:94-110). We now show that the methylation state of B9D1 could serve as a diagnostic biomarker for DCM, as exemplified in
Finally, we investigated the DNA dysmethylation sites with highest significance in blood alone and replicated them in the validation cohorts, as seen in
By using mass-spectrometry-based DNA methylation quantification as an alternative method in another independent set of 82 DCM cases and 109 controls, as seen in Tables 32 and 33, we were able to fine-map and fully replicate the directional, significant dysmethylation of our Top-2 markers (cg06688621 and cg01642653) and their neighbouring CpGs within the same CpG island.
The present study on the epigenetics of heart failure due to DCM identified a significant role of DNA methylation patterns on cardiac gene transcription in myocardial disease. The reproducible DNA methylation patterns identified in this study as well as the successful replication of previous epigenetic loci from other studies, underline the robustness of the findings and support a role in diagnosis and potentially prognostication of heart failure.
The cardiac epigenome is far from being understood. Basically, only very few studies could reliably map DNA methylation changes in human tissue. While in oncology, the surgical resection of the tumour is integral part of the therapy and hence explanted tissue is readily available for research, the therapy of heart failure does mostly not require surgical intervention and only in rare conditions (e.g. obstructive hypertrophic cardiomyopathy) the resection of myocardium (Kim L K, et al., Hospital Volume Outcomes After Septal Myectomy and Alcohol Septal Ablation for Treatment of Obstructive Hypertrophic Cardiomyopathy: US Nationwide Inpatient Database, 2003-2011. JAMA Cardiol. 2016; 1:324-32). In this study, we were able to refine existing methods for high-quality DNA/RNA extraction and consecutive state-of-the-art sequencing and methylation mapping to assess left-over myocardial tissue from biopsies taken during diagnostics of patients suffering from heart failure due to DCM. By including the largest sample set yet, we were able to detect disease-associated methylation marks at epigenome-wide significance level, replicate them in independent cohorts and show their effect on global cardiac gene expression.
Heart failure is an epidemic threat in industrialized nations. The prevalence is already 37.7 million individuals globally, which comes at total medical costs of more than 20.9 billion $ annually in the US alone (Ziaeian B and Fonarow G C. Epidemiology and aetiology of heart failure. Nat Rev Cardiol. 2016; 13:368-78). To better stratify affected patients or individuals at risk, new molecular biomarkers are desired. By a very systematic approach, we found an intriguing overlap of DNA methylation changes in myocardial tissue and blood. Such an overlap is not expected by chance and the replication of diagnostic statistical performance along with the stringent filtering procedure to avoid confounding from blood cell heterogeneity and genomic variation points to robust epigenetic biomarker patterns. In this early-stage systolic dysfunction cohort, we find methylation markers that outperform NT-proBNP. However, the value of the methylation markers in prognostication, therapy monitoring and decision-making must be rigorously evaluated before concluding any superiority to existing biomarkers.
Applying a very stringent cut-off (5×10-8), five epigenome-wide significant hits were found in this study located on Chr. 1, 3, 14, and 17. When using a lower cut-off for genomewide significance used in other epigenome-wide association (EWA) studies (10-6) (Tsai P C and Bell J T. Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation. Int J Epidemiol. 2015), as many as 15 loci could be reliably linked to DCM and heart failure. Genes up- or downstream of the five most-stringent methylation marks all show expression in myocardial tissue. While the top hit from the discovery cohort cg16318181 was replicated in the verification cohort, there is no significant interaction between methylation status and expression of the genes within 10,000 bp distance. However, two of the epigenome-wide significant hits showed direct association with mRNA expression levels, namely cg25838968 (gene body region of PLXNA2) and cg16254946 (within the gene body region of GLIS1). PLXNA2 is a member of the Plexin-A family and a receptor for the guiding molecule Semaphorin 3C and has been described in the context of neural crest and cardiac outflow tract development in the sense of GATA6− (Kodo K, et al., GATA6 mutations cause human cardiac outflow tract defects by disrupting semaphorinplexin signaling. Proc Natl Acad Sci USA. 2009; 106:13933-8) and HAND2-related signalling pathways (Morikawa Y and Cserjesi P. Cardiac neural crest expression of Hand2 regulates outflow and second heart field development. Circ Res. 2008; 103:1422-9).
During heart failure pathogenesis, the re-expression of the fetal gene programme is thought to be a central element of initial adaptation to stressors, but ultimately leads to maladaptation and disease progression. The exact mechanisms by which this concerted switch is realized, is unclear. It is known that non-coding RNAs and several promoter elements and transcription factors are involved. In our analysis, we found and replicated DNA methylation changes in the vicinity of several key-regulators of cardiac development. The transcription factor HAND2, for instance, is implicated in cardiomyocyte differentiation and proliferation in the second heart field (McFadden D G, et al., The Hand1 and Hand2 transcription factors regulate expansion of the embryonic cardiac ventricles in a gene dosage-dependent manner. Development. 2005; 132:189-201). During heart failure, Calcineurin/Nfat signalling as well as certain miRNAs (e.g. miR-25) are thought to control HAND2 activation (Dirkx E, et al., Nfat and miR-25 cooperate to reactivate the transcription factor Hand2 in heart failure. Nat Cell Biol. 2013; 15:1282-93).
In our study, we found a change in DNA methylation of the HAND2 locus significantly associated to the regulation of its transcript. IRX5, TBX5, TBX3 and TBX15 and several of their downstream effectors are also altered in the setting of DCM. Altogether 517 CpGs were directionally replicated to be associated with DCM and mRNA transcription. 307 of the 517 were hypomethylated in DCM and 210 were hypermethylated in DCM. The hypomethylated sites correlated with an upregulation of 374 genes and a downregulation of 173 genes corresponding to an upregulation ratio of 2.16. The hypermethylated sites correlated with an upregulation of 204 genes and a downregulation of 171 genes (upregulation ratio of 1.19). Hence, DNA methylation may be involved in the functional reorganisation of important genes during heart failure and these numbers illustrate that the effect of hypomethylation in DCM seems to result mainly in gene (re)activation, while the effect of hypermethylation is balanced (Movassagh M, et al., Distinct epigenomic features in endstage failing human hearts. Circulation. 2011; 124:2411-22).
Only a few regulatory principles have been identified that drive gene expression during development and under pathological conditions in vivo (Sergeeva I A, et al., Identification of a regulatory domain controlling the Nppa-Nppb gene cluster during heart development and stress. Development. 2016; 143:2135-46). Our data indicate that DNA methylation may act alone or in concert with other mechanisms in this context. As an example may serve the NPPA-NPPB gene cluster. NPPA and -B descend from a common ancestral gene by duplication and hence share common chromatin-regulatory mechanisms (Hotel M, et al., HDAC4 controls histone methylation in response to elevated cardiac load. J Clin Invest. 2013; 123:1359-70). Similarly, we found orchestrated hypomethylation of 5′-flanking CpGs of NPPA and NPPB, which is associated with the upregulation of the transcripts atrial natriuretic factor (ANF) and brain natriuretic peptide (BNP). Strikingly, we find the same direction of hypomethylation in peripheral blood, supporting the intriguing finding of conserved heart failure associated DNA methylation patterning across different tissues.
The bimodality of DNA methylation (two copies of homologous DNA) implies a binary on-off control over gene expression, yet a significant number of intermediate methylated loci throughout the genome do not fit within this model (Elliott G, et al., Intermediate DNA methylation is a conserved signature of genome regulation. Nature communications. 2015; 6:6363). To our knowledge, this is the first study that identified a cross-tissue conservation of such epigenetic patterns occurring during heart failure. Due to our cohort and study design, we can exclude that the observed regulation is only due to medication or other confounders. As shown by the example of NPPA/-B, we postulate that heart failure as a syndrome can impose DNA methylation changes due to mechanisms that are sensitive in different cell types representing an epigenomic signature of context-dependent function (Pai A A, et al., A genome-wide study of DNA methylation patterns and gene expression levels in multiple human and chimpanzee tissues. PLoS Genet. 2011; 7:e1001316).
Potential limitations of this study are confounders that influence the epigenetic pattern and DNA methylation. From a technical perspective, we found that genomic variants within the probe region and batch effects are important aspects that need to be considered. To best address this issue, we conducted whole-genome sequencing of patients to identify those sites and measured a random sample of patients multiple times on different arrays on the Infinium platform to define the strata introduced by batches. On the biological level, pharmacotherapy of cases and controls and heterogeneity of tissue are known to be potential confounders, for which we corrected by Principal Component analysis. Using completely independent replication cohorts, we eliminated confounders such as medication of controls, RNA-seq library generation protocols and methylation measurement batch effects. Using mass-spectrometry based DNA methylation measurement, we further substantiated the reliability of our approach for a selection of markers.
The present study provides to our knowledge the most comprehensive mapping of DNA methylation in the human heart and identifies novel loci associated with heart failure and DCM using a comprehensive approach covering genetic variation, DNA methylation and whole transcriptome analyses. To propel epigenetic studies in cardiovascular diseases, it is necessary to develop novel concepts for statistics (power calculation (Tsai P C and Bell J T. Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation. Int J Epidemiol. 2015), epigenome-wide significance levels, differential methylation models (Wang S. Method to detect differentially methylated loci with case-control designs using Illumina arrays. Genet Epidemiol. 2011; 35:686-94)), appropriate study designs incorporating different biological levels (multi-omics) and definition of adequate controls and confounders. Especially for myocardial tissue, lack of healthy controls constrains the elucidation of cardiac epigenetics. In the present study, we compared failing myocardium against non-failing tissue derived from transplanted hearts showing regular function and a smaller control group of donors that suffered road accidents. Importantly, we show that it is worth studying DNA methylation in peripheral blood, for which adequate controls are often available.
It will be interesting to systematically evaluate DNA methylation markers in longitudinal cohorts of heart failure due to different etiologies including ischemic heart disease. The potential indication of the here detected methylation markers point towards earlier detection of systolic dysfunction and heart failure, but they could also be evaluated for therapy selection and monitoring.
The presently described method allows an efficient and improved tool for finding markers in patients, particularly for non-infectious diseases, like HF and DCM.
With the presently found markers, an improved, early detection and prognosis of HF/DCM, patient stratification for therapy decision support, and optimized, personalized treatment is possible.
This invention reports molecular markers which are indicative of HF/DCM or of the risk developing HF/DCM or for a prediction of therapy effects or therapy outcome.
The present study provides to the knowledge of the inventors the first epigenome-wide association study in living patients with heart failure using a multi-omics approach.
Number | Date | Country | Kind |
---|---|---|---|
16178413.7 | Jul 2016 | EP | regional |
16189099.1 | Sep 2016 | EP | regional |
17171336.5 | May 2017 | EP | regional |
This application is the national phase under 35 U.S.C. § 371 of PCT International Application No. PCT/EP2017/066941 which as an International filing date of 6 Jul. 2017, which designated the United States of America and which claims priority to European Application No. EP 16178413.7 filed 6 Jul. 2016 and European Application No. EP 16189099.1 filed 16 Sep. 2016 and European Application No. 17171336.5 filed 16 May 2017. The entire contents of each application recited above is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/066941 | 7/6/2017 | WO | 00 |