The present invention describes methods for establishing whole-tissue epigenetic clocks for avian species. The thus-obtained epigenetic clocks are particularly robust, generalizable and provide high specificity, accuracy and precision.
Avian species, and in particular Galliformes, such as chicken ( Gallusgallus) are a significant source of commercially produced meat and eggs. Factors that influence the growth, pathogen resistance and meat quality of chicken are thus of considerable scientific and economical interest. Extensive genome-wide association studies have been conducted to elucidate the underlying genetic framework. Epigenetic modifications provide an important complement and extension to genetic variants but have remained relatively underexplored in chicken.
Animal methylomes can be highly diverse, ranging from certain insect genomes with sparse methylation patterns and only tens of thousands of methylation marks to mammalian genomes with dense methylation patterns and tens of millions of methylation marks. Until now, only little is known about the genome-wide DNA methylation patterns of non-mammalian vertebrates, and particularly of birds.
DNA methylation correlates with ageing processes and represents an epigenetic modification with a high specificity for CpG dinucleotides (5'—C—phosphate—G—3'), i.e. regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5'→ 3'direction. The set of genomic methylation modifications constitutes the methylome of a given cell.
Low-methylated regions (LMRs) represent a key feature of the dynamic methylome. LMRs are local reductions in the DNA methylation landscape and represent CpG-poor distal regulatory regions that often reflect the binding of transcription factors and other DNA-binding proteins. LMRs were originally described in the mouse (Stadler et al. Nature 480, 490-495 (2011)). Evolutionary conservation of LMRs beyond mammals has remained unexplored.
Age-correlated DNA methylation changes at discrete sets of CpGs in the human genome have been identified and used to predict age (Horvath, S. (2013). DNA methylation age of human tissues and cell types. Genome Biology 14:3156). These “epigenetic clocks” can estimate the DNA methylation age in specific tissues or tissue-independently and can predict mortality and time to death.
Epigenetic age is highly correlated with chronological age but can also respond to environmental factors that accelerate or decelerate ageing processes, resulting in substantial deviations from chronological age.
Epigenetic age acceleration (epigenetic age > chronological age) suggests that the underlying tissue ages faster than expected on the basis of chronological age, whereas a negative value (epigenetic age < chronological age, age deceleration) suggests that the tissue ages slower than would be expected. Epigenetic age acceleration is associated with a great number of age-related conditions and diseases, such as inflammatory processes.
In view of those conditions accelerating the biological/epigenetic age, age-correlated performance biomarkers are particularly useful tools for animal farming, as they facilitate monitoring large groups of animals and provide objective quality assurance. Avian species present a unique challenge for performance biomarker development, as they combine considerable economic importance with a relatively short lifespan.
Accordingly, it was the objective of the present invention to provide a method of establishing a whole-tissue epigenetic clock for avian species that may be used as a performance biomarker for the respective avian species and that provides robustness and generalizability, and at the same time high specificity, accuracy and precision.
The present invention provides a computer-implemented method of establishing an epigenetic clock for an avian species, the method comprising
Further, it provides a computer program loaded into a memory of a computer, implementing the aforementioned method.
Finally, the present invention pertains to a tangible, computer-readable medium comprising a computer-readable code that, when executed by a computer, causes the computer to perform operations comprising:
The inventors have identified three previously unknown confounding factors for methylation clocks for avian species:
The inventors have found that robustness, specificity, accuracy and precision of epigenetic clocks for avian species can be significantly improved by (i) excluding from the initially identified clock CpG sites all CpG sites associated with single nucleotide polymorphisms (SNPs), (ii.) excluding all CpG sites located on the sex chromosomes (Z and W), and (iii.) normalizing the CpG methylation values.
Accordingly, the present invention provides a computer-implemented method of establishing an epigenetic clock for an avian species, the method comprising
The term “CpG site”, “clock CpG” or “CpG location” as used in the context of the present invention refers to a CpG position that is potentially methylated. Methylation typically occurs in a CpG containing nucleic acid. The CpG containing nucleic acid may be present in, e.g. a CpG island, a CpG doublet, a promoter, an intron, or an exon of a gene or in an intergenic region. For instance, the potential methylation sites may encompass the promoter/enhancer regions of the indicated genes.
According to method step (a), CpG sites within the genomic DNA of the avian species were identified and the methylation level of those CpG sites is determined.
Accordingly, method step (a) involves a DNA methylation profiling process, preferably bisulfite sequencing. Therein, cytosine residues in the genomic DNA are transformed to uracil, while 5-methylcytosine residues in the genomic DNA are not transformed to uracil.
Whole genome bisulfite sequencing is a genome-wide analysis of DNA methylation based on the sodium bisulfite conversion of genomic DNA, which is then sequenced on a next-generation sequencing platform. The sequences are then re-aligned to the reference genome to determine methylation states of the CpG dinucleotides based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
For example, methylation levels can be measured using the commercial Illumina™ platform.
To quantify the methylation level, various established protocols may be used to calculate the beta value of methylation, which equals the fraction of methylated cytosines in a specific location.
The specific CpG sited within the genomic DNA of the avian species may be distributed over the whole genome (“genome-wide clock”) or may be located within LMRs (“LMR clock”). Details for establishing the genome-wide clock and the LMR clock, resp., are provided below.
The genomic DNA is obtained from a plurality of different sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species. As an example, the sample material may be stratified into four tissue (breast, ileum, spleen and jejunum) and three age (3 d, 15 d, 34 d) groups. Details regarding suitable sample materials will be provided below. Ideally, the sample material covers the entire life cycle of the avian species under investigation.
In method step (b), all CpG sites associated with single nucleotide polymorphisms (SNPs) were excluded from the CpG sites identified in step (a.). SNPs can be determined using standard procedures known in the art, such as whole-genome sequencing. Alternatively, SNPs in the genome of selected species are publicly available in databases, such as dbSNP (https://www.ncbi.nlm.nih.gov/snp/).
In method step (c), all CpG sites located on the sex chromosomes (Z and W) were excluded from the CpG sites obtained in step (b.). Birds have female heterogamy with Z and W sex chromosomes (hhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2567362/). The chromosome names are usually annotated in the assembly of a species. As an example for chicken ( Gallusgallus), the chromosomal location of a CpG can be derived from the annotation of the Gallusgallus genome assembly version 5.0 (https://www.ebi.ac.uk/ena/data/view/GCA_000002315.3).
In method step (d), a tissue-specific normalization step for the CpG sites obtained in step (c.) is performed. Normalization is performed by computing for every CpG the average methylation value over all samples from the same tissue and subtracting the thus-obtained value from the value of this CpG (for the LMR clock: by computing for every LMR the average methylation value over all samples from the same tissue and subtracting the thus-obtained value from the value of this LMR). This normalization is necessitated by the different aging trajectories of individual tissues.
The CpG sites obtained in step (d), i.e. the CpG sites remaining after correction of the above-mentioned confounding factors, were finally correlated with chronological age using a penalized regression model (method step (e)).
The plurality of different biological sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species may include material selected from the group consisting of body fluids, excremental material, tissue material and feather material. In one embodiment of the invention, the plurality of different biological sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species includes only one specific tissue, or maximally four different tissues.
Preferably, the plurality of different biological sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species includes at least four different tissues and preferably exactly four different tissues.
In one embodiment of the present invention, the plurality of different biological sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species comprises or consists of tissue material selected from muscle tissue; organ tissue, such as gut tissue; and skin tissue.
Preferably, the plurality of different biological sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species includes or consists of breast tissue, spleen tissue, ileum tissue and jejunum tissue. The aforementioned set of tissues is particularly preferred as it represents a biologically diverse and commercially relevant set of tissues.
The plurality of different biological sample materials deriving from the avian species and representing specific time points within the chronological lifespan of this avian species is preferably selected to represent ages ranging between day 3 and day 63, in particular between day 4 and day 42 and preferably between day 5 and day 35.
For example, the life cycle of chicken starts with eggs taken from parent birds in the hatchery which are then incubated at a constant temperature for 21 days until the birds hatch, though at this stage the precocial chicken might be up to 72 hours old they are called one-day chicken. These chickens are separated by sexes and the female birds are kept for approx. one year for laying eggs. The lifespan for broiler chicken is significantly shorter and varies between 21 days and up to 170 days. An average US broiler is slaughtered after 47 days at a slaughter weight of 2.6 kg while in Europe the average slaughter age is at 42 days (at a weight of 2.5 kg).
As indicated above, the specific CpG sites within the genomic DNA of the avian species may be distributed over the whole genome of the avian species (“genome-wide clock”). In this case, the CpGs were preferably restricted to a strand-specific coverage of at least 10.
In an alternative embodiment, the specific CpG sites within the genomic DNA of the avian species are distributed within low methylated regions (LMRs) in the genome of the avian species. In this case, method step (a) includes a step of computing LMRs individually for the different tissues.
Suitable LMR computing programs are known in the art, for example MethylSeekR (Burger L, Gaidatzis D, Schubeler D, Stadler MB. Identification of active regulatory regions from DNA methylation data. Nucleic Acids Res 41, e155 (2013)).
For establishing the LMR clock, the specific CpG sites within the genomic DNA of the avian species were preferably restricted to a strand-specific coverage of at least greater than 5.
An LMR clock allows the conceptual interpretation of the selected features, as LMRs represent transcription factor binding sites. This represents an important advantage compared to all-CpG clocks. Furthermore, LMR clocks are more robust to noise, as the features represent averages over regions and noise cancels out.
In addition to the above, the present invention pertains to a computer program loaded into a memory of a computer, implementing any one of the above-described method.
Finally, the present invention relates to a tangible, computer-readable medium comprising a computer-readable code that, when executed by a computer, causes the computer to perform operations comprising:
Applications of the methods according to the invention are for example development of new epigenetic clocks as biomarkers (i) aiding in evaluation of the health status of avian species (individual or population) (ii) monitoring the progress or reoccurrence of clinical and sub-clinical disorders or (iii) studying the effects of medication, feed compounds and/or special diets on the biological age - and thus on the health status of the respective avian species.
Animals were stratified into four tissue (breast, ileum, spleen and jejunum) and three age (3 d, 15 d, 34 d) groups, in case of jejunum 14 d, 16 d and 35 d. From each of these 12 groups, DNA was prepared from three independent animals, resulting in 36 genomic DNA samples.
Whole-genome bisulfite sequencing libraries were prepared using the Accel-NGS Methyl-Seq DNA Library Kit from Swift Biosciences. Two sequencing libraries were barcoded onto one sequencing lane. Sequencing was performed on an Illumina HiSeq X platform using a standard paired-end sequencing protocol with 105 nucleotides read length.
Reads were trimmed and mapped with BSMAP 2.5 (Xi Y, Li W. 2009. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10:232. doi:10.1186/1471-2105-10-232.) using the Gallusgallus genome assembly version 5.0 (https://www.ebi.ac.uk/ena/data/view/GCA_000002315.3) as a reference sequence. Duplicates were removed using the Picard tool (http://broadinstitute.github.io/picard). Methylation ratios were determined using a Python script (methratio.py) distributed together with the BSMAP package by dividing the number of reads having a methylated CpG at a certain genomic position by the number of all reads covering this position.
All CpGs which are listed as SNPs in the database dbSNP (https://www.ncbi.nlm.nih.gov/snp/) for the Gallus gallus genome were filtered out. All CpGs and LMRs mapping to the Galliformes sex Chromosomes W and Z were filtered out and removed from the data sets. For the genome-wide clock, the analysis was restricted to CpGs that showed a strand specific coverage of greater than 10 in every of the sequenced samples, resulting in a set of 257,913 CpGs. Then the data were normalized by computing for every CpG the average methylation value over all samples from the same tissue and subtracted this value from the methylation value of this CpG. For the LMR clock, the analysis was restricted to CpGs within low-methylated regions that showed a strand specific coverage of greater than 5 in every of the sequenced samples, resulting in a set of 67,651 LMRs. The average methylation values of these LMRs were computed and normalized by computing for every LMR the average value over all samples from the same tissue and subtracting this value from the value of this LMR.
Then a penalized regression model (implemented in the R package glmnet [https://cran.r-project.org/web/packages/glmnet/]) was applied to regress the chronological age of the animals on the normalized methylation values of the CpG probes. In the case of the LMR clock a penalized regression model was applied to regress the chronological age of the animals on the normalized average methylation values of the LMRs.
The alpha parameter of glmnet was varied in a range between 0 and 1 and chosen as 0.7 (elastic net regression), because this value led to a fit that was close to the best fit and a manageable amount of CpGs. The lambda value was chosen using cross-validation on the training data as 0.4016. This identified a set of 45 CpGs together with corresponding beta values, which define the weights for these CpGs used in the chicken methylation clock. The mean squared error of 6-fold crossvalidation using the values of 0.7 for alpha and 0.4016 for lambda was 11.538. This indicates that a new sample can be predicted with an error of about 3.4 days. In order to apply the clock to a new sample the methylation ratios of this sample at the 45 clock CpGs have to be provided and the command predict.cv of the package glmnet with the trained clock has to be performed.
The alpha parameter of glmnet was varied in a range between 0 and 1 and chosen as 0.84 (elastic net regression), because this value led to a fit that was close to the best fit and a manageable amount of LMRs. The lambda value was chosen using cross-validation on the training data as 0.3194. This identified a set of 39 LMRs together with corresponding beta values, which define the weights for these LMRs used in the chicken methylation clock. The mean squared error of 6-fold crossvalidation using the values of 0.84 for alpha and 0.3194 for lambda was 13.4831. This indicates that a new sample can be predicted with an error of about 3.7 days. In order to apply the clock to a new sample the methylation ratios of this sample at the 39 clock LMRs have to be provided and the command predict.cv of the package glmnet with the trained clock has to be performed.
The alpha value was varied in a range between 0 and 1 and chosen as 0.9 (elastic net regression).This identified a set of 32 LMRs together with corresponding beta values, which define the weights for these LMRs used in the chicken methylation clock (Tab. 3).
The average methylation values of these LMRs were computed and normalized by computing for every LMR the average value over all samples from the same tissue and subtracting this value from the value of this LMR (in case of the CpG clock by computing for every CpG the average value over all samples from the same tissue and subtracting this value from the value of this CpG), see above. The rationale for this approach is illustrated in
In order to validate the LMR clock, whole-genome bisulfite sequencing of 6 samples (breast) in two age groups (14 and 28 days) from a completely independent animal trial was performed. Age prediction showed a root mean square error of 2.7 days and 3.8 days, respectively, which is consistent with the prediction error obtained after cross-validation. Results are visualized in
Analysis of jejunum samples showed a pronounced and highly consistent age acceleration, in particular at days 14 and 16 (
Number | Date | Country | Kind |
---|---|---|---|
20153518.4 | Jan 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/051434 | 1/22/2021 | WO |