METHYLATION DATA SIGNATURES OF AGING AND METHODS OF DETERMINING A METHYLATION AGING CLOCK

Information

  • Patent Application
  • 20220005552
  • Publication Number
    20220005552
  • Date Filed
    September 20, 2021
    3 years ago
  • Date Published
    January 06, 2022
    3 years ago
  • CPC
    • G16B40/00
    • G16H15/00
    • G16H50/30
    • G16B5/20
  • International Classifications
    • G16B40/00
    • G16B5/20
    • G16H50/30
    • G16H15/00
Abstract
A method of creating a biological aging clock for a subject can include: (a) receiving a biological data signature derived from a tissue or organ of the subject; (b) creating input vectors based on the biological data signature; (c) inputting the input vectors into a machine learning platform; (d) generating a predicted biological aging clock of the tissue or organ based on the input vectors by the machine learning platform, wherein the biological aging clock is specific to the tissue or organ; and (e) preparing a report that includes the biological aging clock that identifies a predicted biological age of the tissue or organ. The biological data signature can be based on biological pathway activation signatures for DNA methylomics.
Description
BACKGROUND

While aging may be a complex multifactorial process with no single cause or treatment, the issue whether aging can be classified as the disease is widely debated. Many strategies for extending organismal life spans have been proposed including replacing cells and organs, comprehensive strategies for repairing the accumulated damage, using hormetins to activate endogenous repair processes, modulating the aging processes through specific mutations, gene therapy and small molecule drugs. An animal's survival strongly depends on its ability to maintain homeostasis, achieved partly through intracellular and intercellular communication within and among different tissues.


Lifespan of different cells and tissues varies substantially. Although aging affects gene expression and protein production in multiple tissues, the set of genes are highly tissue specific and depend on their functions in the tissue, such as by the proteins produced as the final product of gene expression. As the regeneration rates and associated with it gene expression and protein production patterns vary, external effectors, such as small molecules, have different effect on different tissues. As a result, gene expression and protein production can provide tissue specific signatures that can be studied to find information for interventions that could bring the tissues, organ, or person back to a younger state without an additional adverse effects on other tissues.


Until recently, treatments and therapies for senescence reversal (aging reversal) have been rare, largely because of the complexity of the underlying mechanisms of senescence and the lack of tools for understanding and treating senescence. One example of drug development for senescence protection (rather than senescence reversal) can be seen in US 2017/0073735. Recent bioinformatics developments such as deep neural networks have opened up the possibility of developing highly-personalized senescence reversal treatments, based on gene expression and/or protein production of senescent tissues versus non-senescent tissues, as will be disclosed in the present invention.


Presently, none of the proposed strategies for senescence treatment provide a roadmap for rapid screening, validation and clinical deployment. No methods currently exist to predict the effects of currently available drugs on human longevity and health span in a timely manner.


Many biomarkers of aging have been proposed including telomere length, intracellular and extracellular aggregates, racemization of the amino acids and genetic instability. Gene expression and DNA methylation profiles change during aging, which also may be used as biomarkers of aging. As a result, protein production profiles that are translated from the genetically expressed mRNA may correspondingly be used as biomarkers of aging. Many studies analyzing transcriptomes or proteomes of biopsies in a variety of diseases indicated that age and sex of the patient have significant effects on gene expression and subsequent protein production and that there are noticeable changes in gene expression with age in mice, resulting in development of mouse aging gene expression databases and in humans.


Combinations of protein-protein interaction from the produced proteins and gene expression in both flies and humans demonstrate that aging is mainly associated with a small number of biological processes, which might preferentially attack key regulatory nodes that are important for network stability.


Work of the inventors, among others, with gene expression and epigenetics of various solid tumors provided clues that transcription profiles of cells mapped onto the signaling pathways may be used to screen for and rate the targeted drugs that regulate pathways directly and indirectly related to aging and longevity. Prior studies suggest that a combination of pathways, termed pathway cloud, instead of one element of the pathway or the whole pathway might be responsible for pathological changes in the cell.


The senescence response causes striking changes in cellular phenotype. Aging/senescence in humans causes striking changes in cellular phenotype. According to (Campisi and d'Adda di Fagagna 2007) the senescent phenotype is induced by multiple stimuli. Mitotically competent cells respond to various stressors by undergoing cellular senescence. These stressors include dysfunctional telomeres, non-telomeric DNA damage, excessive mitogenic signals including those produced by oncogenes (which also cause DNA damage), non-genotoxic stress such as perturbations to chromatin organization and, probably, stresses with an as-yet unknown etiology. These changes include an essentially permanent arrest of cell proliferation, development of resistance to apoptosis (the death of some cells that occurs as a normal and controlled part of an organism's growth or development) and an altered pattern of gene expression and protein production. Also, the expression or appearance of senescence-associated markers such as senescence-associated β-galactosidase, p16, senescence-associated DNA-damage foci (SDFs) and senescence-associated heterochromatin foci (SAHFs) are neither universal nor exclusive to the senescent state.


Cellular senescence is thought to contribute to age-related tissue and organ dysfunction and various chronic age-related diseases through various mechanisms. Senescence is characterized by a persistent proliferative arrest in which cells display a distinct pro-inflammatory senescent-associated secretory phenotype (SASP) (Krimpenfort and Berns 2017). Whereas SASP exerts a supportive paracrine function during early development and wound healing (Demaria et al. 2014), the continuous secretion of these SASP factors has detrimental effects on normal tissue homeostasis and is considered to significantly contribute to aging (DiLoreto and Murphy 2015).


In a cell-autonomous manner, senescence acts to deplete the various pools of cycling cells in an organism, including stem and progenitor cells. In this way, senescence interferes with tissue homeostasis and regeneration, and lays the groundwork for its cell-non-autonomous detrimental actions involving the SASP. There are at least five distinct paracrine mechanisms by which senescent cells are thought to promote tissue dysfunction, including perturbation of the stem cell niche (causing stem cell dysfunction), disruption of extracellular matrix, induction of aberrant cell differentiation (both creating abnormal tissue architecture), stimulation of sterile tissue inflammation, and induction of senescence in neighboring cells (paracrine senescence). An emerging yet untested concept is that post-mitotic, terminally differentiated cells that develop key properties of senescent cells might contribute to ageing and age-related disease through the same set of paracrine mechanisms (van Deursen 2014).


Several recent observations support the hypothesis that senescence is a highly-dynamic, multi-step process, during which the properties of senescent cells continuously evolve and diversify, much like tumorigenesis but without cell proliferation as a driver (De Cecco et al. 2013; Wang et al. 2011; Ivanov et al. 2013). This includes not only senescent cells but also take in account pre-senescent stage. This fact also means there is an opportunity to reverse the cell to normal non-senescent behavior.


There has always been a need to reverse senescence, but only recently are there the necessary tools, particularly, developments in informatics and machine learning, to develop and apply such senescence therapies and treatments. Further, even commonly-accepted biomarkers and metric of such biomarkers to assess aging have been lacking.


At least two general concepts of age exist in the art. One, “chronological age” is simply the actual calendar time an organism or human has been alive. Another one, called “biological age” or “physiological age”, which is a particular focus of the present invention, is related to the physiological health of the individual, and biomarkers thereof, whether transcriptomic or proteomic. Biological age is associated with how well organs and regulatory systems of the body are performing and at what extent the general homeostasis at all levels of the organism is being maintained, as such functions generally decline with time and age.


The measurement of any physiological process of an organism is typically done with a set of predefined biomarkers. A biomarker can be defined as a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. Biomarkers are chosen by scientists in order to measure a very-well defined process within the body.


Given that in a multi-cellular organism that aging is a systemic process, which cannot be readily captured by single uni-dimensional or even several metrics, the development of an accurate and useful measure of biological age (which can be thought of as a biological clock), is subject to specific challenges. Again, such biomarkers must not only be an objective quantifiable and easily measurable characteristics of the biological aging process, but must also be able to take into account that aging is not a single specific process, but rather a suite of changes across multiple physiological systems.


In other words, no single biomarker can provide an accurate overall biological clock age of a multi-cellular organism, nor can the biological age of a single cell, tissue, or organ, even when composed of many biomarkers, provide an accurate overall biological age of an organism. And in fact, it is often useful to have several biological clocks assigned to an organism or human, that is, a different biological age can be assigned to different cells, tissues, or organs of that organism, as well as different clocks based on a different biomarker or different biomarker. Thus, there may be one clock for the skin, one for the liver, one clock based on telomere length of a cell(s), tissue(s), or organ(s), and another based on a different biomarker.


In the past, several attempts have been made to develop adapted biomarkers for measuring biological aging. However, the biomarkers used so far focus on monitoring a restricted number of processes known for being directly involved in the onset and propagation of aging related damages through the body. Examples of such biomarkers are telomere length (Lehmann, 2013), intracellular and extracellular aggregates, racemization of the amino acids and genetic instability. Both gene expression (Wolters, 2013) and DNA methylation profiles (Horvath, 2012, Horvath, 2013, Mendelsohn, 2013) change during aging and may be used as biomarkers of aging as demonstrated previously with the epigenetic clock (Horvath, 2012, Horvath, 2013). Many studies analyzing transcriptomes of biopsies in a variety of diseases indicated that age and sex of the patient had significant effects on gene expression (Chowers, 2003) and that there are noticeable changes in gene expression with age in mice (Weindruch, 2002, Park, 2009), resulting in development of mouse aging gene expression databases (Zahn, 2007) and in humans (Blalock, 2003; Welle, 2003; Park, 2005; Hong, 2008; de Magalhaes, J. P, 2009).


The first aging clocks based on omics data date back to 2013. That year, two seminal articles dedicated to DNAm aging clocks were published: [Horvath, 2013] by Horvath and [Hannum G, et al. (2013). Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell, 49:359-367] by Hannum et al. Each study describes an algorithm that estimates human chronological age based on data obtained from Illumina DNAm microarrays. Their implementations are different, yet they share a common nature. Both solutions rely on the elastic net regularized regression method, a type of linear model in which the methylation levels at specific dinucleotide CpG loci are assigned weights and then summed to obtain a final prediction. Horvath's model includes 353 CpG sites on Illumina 450 k and 27 k DNAm array platforms, while the model published by Hannum et al. is based on 71 sites on Illumina 450 k platforms. Interestingly, the CpG sites used by the two models have little overlap, as only six sites are shared between them. Despite the significant differences in data preprocessing, training samples, and final features, these aging clocks show similar performance when validated in a variety of experimental settings. The error margins reported by their authors are similar as well: a median absolute error (MedAE) of 3.6 years for the 353 CpG clock and a root mean square error (RMSE) of 3.9 years for the 71 CpG clock.


Additional background related to methylation can be found in the following references, which are incorporated herein by specific reference in their entirety: US2020190568A1; WO2020074533A1; WO2019046725A1; WO2018139826A1; CN104966106A; WO2014146793A1; US2016222448A1; US2019185938A1; US2020056234A1; WO2019143845A1; WO2019232320A1; WO2020037222A1; WO2020076983A1; US2015259742A1; US2014228231A1; EP2711431B1; and WO2020163490A1.


SUMMARY

In some embodiments, a method of creating a DNA methylation biological aging clock for a subject can include: (a) receiving a DNA methylation data signature derived from a biological sample of the subject; (b) creating input vectors based on the DNA methylation data signature; (c) inputting the input vectors into a machine learning platform; (d) generating a predicted biological aging clock of the cell, fluid, tissue or organ of the biological sample based on the input vectors by the machine learning platform, wherein the biological aging clock is specific to the subject (e.g., to biological sample of fluid, tissue or organ); and (e) preparing a report that includes the biological aging clock that identifies a predicted biological age of the subject. In some aspects, the method can include: creating at least a second biological aging clock by repeating any one or more of steps (a), (b), (c), and/or (d), wherein the second biological aging clock is based on a second DNA methylation data signature from the biological sample of the subject, a different cell, fluid, tissue or organ or other sample of the subject, or a biological sample of a second subject; and optionally, preparing a report that includes the second biological aging clock that identifies a second predicted biological age of the subject, a different cell, fluid, tissue or organ of the subject, or a cell, fluid, tissue or organ of a second subject. In some aspects, the method can include: combining the biological aging clock with the second biological aging clock to create a synthetic biological aging clock, wherein the synthetic biological aging clock provides a synthetic biological age of the fluid, tissue, organ, or of the subject; and optionally, preparing a report that includes the synthetic biological aging clock that identifies the synthetic biological age of the fluid, tissue, organ, or of the subject. In some aspects, the method can include one or more of: comparing the predicted biological age of the cell, fluid, tissue or organ or the subject with the actual age of the subject; comparing the second predicted biological age of the cell, fluid, tissue or organ or the subject with the actual age of the subject; or comparing the synthetic biological age of the cell, fluid, tissue or organ or the subject and with the actual age of the subject, wherein the method further comprises: preparing a report with the comparing and with a difference from the actual age of the subject.


In some embodiments, the report includes one or more of: a therapeutic regimen based on the predicted biological age in view of an actual age of the subject; a diet regimen based on the predicted biological age in view of an actual age of the subject; a questionnaire about lifestyle habits; a prognosis of the life expectancy with and/or without the therapeutic regimen; a prognosis of the life expectancy with and/or without the diet regimen; a prognosis of the probability of survival of patient during the therapeutic regimen; a prognosis of the probability of survival of patient during the diet regimen; a prognosis of developing disease complications or therapy side effects; a prognosis of the severity degree of diseases; an identification of disease stages; or a prognosis of physical fitness of the patient.


In some embodiments, the cell, fluid, tissue or organ are: diseased; healthy; determined as susceptible to disease; undergoing senescence; in pre-senescence; or non-senescent. The tissue or organ can be substituted with any biological sample, such as urine, saliva, blood, plasma, spinal fluid, or the like. Also, it is recognized that the tissue or organ can be represented by one or more cells thereof, or cell types thereof.


In some embodiments, a therapeutic regimen includes one or more of: applying a senoremediation drug treatment protocol to the subject in order to rescue one or more first cells in the subject; applying a senolytic drug treatment protocol to the subject in order to remove one or more second cells in the subject; introducing stem cells into a tissue and/or organ of the subject in order to rejuvenate one or more tissue cells in the tissue and/or one or more organ cells in the organ; carrying out a reinforcement step that includes one or more actions that prevent further senescence or degradation of the tissue or organ; or one or more actions that prevent further senescence or degradation of the tissue or organ is derived from the computational proteome analysis of the cell, fluid, tissue or organ of the subject.


In some embodiments, the method can include: performing feature importance analysis for ranking genes or gene sets (or DNA methylation) by their importance in age prediction by using the biological data; correlating a genomics or DNA methylation profile with the predicted biological age of the subject; correlating a proteomics profile with the predicted biological age of the subject; correlating a transcriptomics profile with the predicted biological age of the subject; correlating a metabolomics profile with the predicted biological age of the subject; correlating a lipidomics profile with the predicted biological age of the subject; correlating a glycomics profile with the predicted biological age of the subject; correlating a secretomics profile with the predicted biological age of the subject; identifying a subset of a genes or gene sets or biological pathways thereof that are selected as targets the therapeutic regimen; or correlating a biological signaling pathway signature with the predicted biological age of the subject.


In some embodiments, the biological data signature is based on biological pathway activation signatures for genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics, or secretomics. In some aspects, the method includes obtaining biological sample of the cell, fluid, tissue or organ of the subject; and obtaining the biological data by performing a measurement of the genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics or secretomics. In some aspects, the biological data signature is based on a simulation by a computer program for biological pathway activation signatures for genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics or secretomics. In some aspects, the biological data is an omics signature of biological data. In some aspects, the omics signature is genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics or secretomics.


In some embodiments, the method can include after a defined time period: performing steps (a), (b), (c), (d), and (e) in a second iteration; comparing the initial report with the report of the second iteration; and determining a change in the predicted biological age over the defined time period. In some aspects, the method can include: performing a therapeutic regimen over a defined time period, performing steps (a), (b), (c), (d), and (e) in a second iteration; and comparing the initial report with the report of the second iteration; determining a change in the predicted biological age over the defined time period; and determining: whether the therapeutic regimen changed the predicted biological age, if the therapeutic regimen changed the predicted biological age, then determine whether or not to: continue therapeutic regimen, change therapeutic regimen, or stop therapeutic regimen, or if the therapeutic regimen does not change the predicted biological age, then determine whether or not to: continue therapeutic regimen, change therapeutic regimen, or stop therapeutic regimen.


In some embodiments, the method can include performing one or more of: a therapeutic regimen based on the predicted biological age in view of an actual age of the subject; or a diet regimen based on the predicted biological age in view of an actual age of the subject.


In some embodiments, the method includes performing one or more of an actuarial assessment of the subject based on the predicted biological age; a risk assessment based the predicted biological age; an insurance assessment based on the predicted biological age.


In some embodiments, the method can include: (f) receiving a second biological data signature derived from a baseline, the second biological data signature being from a second organ or tissue of the subject or a second subject, the organ or tissue being the same or different from the second organ or tissue; and computing a difference between the signature of (a) and the signature of (f) to provide input vectors to the machine learning platform, wherein the machine learning platform outputs classification vectors that comprise components of the biological aging clock. The biological data signature can be the DNA methylation profile.


In some embodiments, at least one of the biological data signatures is based on an in silico biological pathway activation network decomposition.


In some embodiments, the method includes creating at least a second biological aging clock by: (a2) receiving at least two omics signatures derived from a biological sample (e.g., cell, fluid, tissue or organ) of the subject, wherein the at least two omics signature is selected from genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics or secretomics, wherein the first input vectors are based on a first omics signature; (b2) creating second input vectors based on a second omics signature; (c2) inputting the first and second input vectors based on the at least two omics signatures into a machine learning platform; (d2) generating a second predicted biological aging clock of the cell, fluid, tissue or organ based on the second input vectors by the machine learning platform, wherein the second predicted biological aging clock is specific to the cell, fluid, tissue or organ, and thereby of the subject; and (e2) preparing the report or a second report that includes the second biological aging clock that identifies a predicted biological age of the cell, fluid, tissue or organ. In some aspects, the method can include: combining the biological aging clock with the second biological aging clock to create a synthetic biological aging clock, wherein the synthetic biological aging clock provides a synthetic biological age of the fluid, tissue, organ, or thereby of the subject; and optionally, preparing a report that includes the synthetic biological aging clock that identifies the synthetic biological age of the fluid, tissue, organ, or of the subject.


In some embodiments, a computer program product can include a tangible, non-transitory computer readable medium having a computer readable program code stored thereon, the code being executable by a processor to perform a method for biological aging clock for a patient, the method can include: (a) receiving a biological data signature (e.g., DNA methylation profile) derived from a biological sample (e.g., cell, fluid, tissue or organ) of the subject; (b) creating input vectors based on the biological data signature; (c) inputting the input vectors into a machine learning platform; (d) generating a predicted biological aging clock of the subject (e.g., from cell, fluid, tissue or organ) based on the input vectors by the machine learning platform, wherein the biological aging clock is specific to the subject, such as to the cell, fluid, tissue or organ; and (e) preparing a report that includes the biological aging clock that identifies a predicted biological age of the sample origin, such as the cell, fluid, tissue or organ that represents the predicted biological age of the subject. In some aspects, the computer performed method can include: creating at least a second biological aging clock by repeating any one or more of steps (a), (b), (c), and/or (d), wherein the second biological aging clock is based on a second biological data signature from the cell, fluid, tissue or organ of the subject, a different cell, fluid, tissue or organ of the subject, or a cell, fluid, tissue or organ of a second subject; and optionally, preparing a report that includes the second biological aging clock that identifies a second predicted biological age of the cell, fluid, tissue or organ of the subject, a different cell, fluid, tissue or organ of the subject or a cell, fluid, tissue or organ of a second subject. In some aspects, the computing method can include: combining the biological aging clock with the second biological aging clock to create a synthetic biological aging clock, wherein the synthetic biological aging clock provides a synthetic biological age of the fluid, tissue, organ, or of the subject; and optionally, preparing a report that includes the synthetic biological aging clock that identifies the synthetic biological age of the fluid, tissue, organ, or of the subject.


In some embodiments, the computing method can include: comparing the predicted biological age of the cell, fluid, tissue or organ with the actual age of the subject; comparing the second predicted biological age of the cell, fluid, tissue or organ with the actual age of the subject; comparing the synthetic biological age of the subject (e.g., by analysis of the cell, fluid, tissue or organ) and with the actual age of the subject, wherein the method further comprises: preparing a report with the comparing and with a difference from the actual age of the subject.


In some aspects, the computing method can include: performing feature importance analysis for ranking genes or gene sets (or DNA methylation profile) by their importance in age prediction by using the biological data; correlating a genomics profile with the predicted biological age of the subject; correlating a proteomics profile with the predicted biological age of the subject; correlating a transcriptomics profile with the predicted biological age of the subject; correlating a metabolomics profile with the predicted biological age of the subject; correlating a lipidomics profile with the predicted biological age of the subject; correlating a glycomics profile with the predicted biological age of the subject; correlating a DNA methylation profile with the predicted biological age of the subject; correlating a secretomics profile with the predicted biological age of the subject; identifying a subset of a genes or gene sets or biological pathways thereof that are selected as targets the therapeutic regimen; or correlating a biological signaling pathway signature with the predicted biological age of the subject.


In some embodiments, the computing method further includes: after a defined time period, performing steps (a), (b), (c), (d), and (e) in a second iteration; comparing the initial report with the report of the second iteration; and determining a change in the predicted biological age over the defined time period.


In some embodiments the biological data signature using in the computing method is based on biological pathway activation signatures for genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics, or secretomics. In some aspects, the biological data signature is based on a simulation by a computer program for biological pathway activation signatures for genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics, or secretomics. In some aspects, the biological data is an omics signature of biological data. In some aspects, the omics signature is genomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, DNA methylomics, or secretomics.


In some embodiments, the model is acquired by machine learning, the machine learning being based on machine learning training data comprising DNA methylation signatures.


In some embodiments, the biological clock methods can include: deriving of training data from a DNA methylation profile representing the real world DNA methylation of the subject and comprising information of the real world DNA methylation of the subject that it represents; and training an object detector/classifier by machine learning on said training data.





BRIEF DESCRIPTION OF THE FIGURES

The foregoing and following information as well as other features of this disclosure will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the disclosure and are, therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings.



FIG. 1 shows an embodiment of an age prediction pipeline which is applied to patients with pre-senescent, senescent, fibrotic conditions or age-related diseases.



FIG. 2 shows an embodiment of an age prediction pipeline combined with iPANDA analysis used to select the personalized treatment.



FIG. 3 illustrates the predicted age by deep transcriptomic clock method for biological aging assessment based on blood transcriptomic profiles, compatible with the current invention, vs actual chronological age of healthy individual in the validation set.



FIG. 4 illustrates the predicted age by transcriptomic clock method for biological aging assessment based on muscle transcriptomic profiles, compatible with the current invention, vs actual chronological age of healthy individual in the validation and testing set.



FIG. 5 illustrates the predicted age by deep transcriptomic clock method for biological aging assessment based on muscle transcriptomic profiles, compatible with the current invention, vs actual chronological age groups of healthy individual in the external validation set.



FIG. 6 illustrates distribution on number of samples by age for healthy individuals in the validation set.



FIG. 7 illustrates an example epsilon-prediction accuracy for healthy individuals.



FIG. 8 illustrates clustering using t-SNE clustering algorithm by age for healthy individuals.



FIG. 9 List of the most important genes selected by the Borda count algorithm applied over ranks assigned by deep transcriptomic clocks, compatible with the current invention, and other machine learning models as described.



FIG. 10 illustrates a Venn diagram showing organs, cells, and body fluids, and number of specific targets thereof.



FIG. 11 illustrates the delta (difference between assigned (predicted) biological age and actual chronological age) bar plots grouped by age ranges for healthy people based on an exemplary validation set as described.



FIG. 12 shows an example of a biological age clock, or a report thereof with a hazard ratio for different subgroups.



FIG. 13 shows an example of a biological age clock, or a report thereof to compare various subgroups with actual age and predicted ages, and shows the delta (difference between assigned (predicted) biological age and actual chronological age) bar plots grouped by age ranges for healthy people based on an exemplary validation set as described.



FIG. 14 shows an example computing device 600 (e.g., a computer) that may be arranged in some embodiments to perform the methods (or portions thereof) described herein.



FIG. 15 includes graphs that show the log 2 aging ratio (log 2 transformed ratio of predicted biological age to actual age) in diabetic patients taking both insulin and hypoglycemic agents (e.g., first group), taking only insulin (e.g., second group), only hypoglycemic agents (e.g., third group) and taking nothing (e.g., fourth group) as predicted by DNN.



FIG. 16 includes a graph showing an aging ratio (e.g., Predicted/Actual chronological age) in healthy individuals from South Korea, Canada, and Eastern European for predicted biological age by the DNNs trained on Eastern European population.



FIG. 17 includes an example of Kaplan-Meir plot for individuals predicted younger (<−5) and older (>5) than they chronologically are and individuals within the error (−5:5).



FIG. 18 shows the predicted age versus actual age for training, verification, a training case, and a verification case.



FIG. 18A shows the real age versus the Blood Age (BloodAge).



FIG. 18B shows the error and absolute error for prediction in age for males and females.



FIG. 19 shows the DeepMAge model prediction distribution.



FIGS. 20A-20D show the prediction age versus actual age for training and verification protocols.



FIGS. 21A-21B show predicted age versus the actual age for a study, with training and verification.



FIG. 22 shows the aging clock prediction errors for DeepMAge model compared to 353 CpG.



FIG. 23 shows the BMI effect on predicted age for DeepMAge model compared to 353 CpG.



FIG. 24 shows the Ven diagram of overlapping DNA methylation sites for DeepMAge, 353 CpG and 71 CpGs.



FIG. 25 shows the absolute prediction error for the aging clocks.



FIG. 26 illustrates a method for obtaining, training, verifying, and using a DNA methylation biological clock.



FIG. 27 illustrates another method for obtaining, training, verifying, and using a DNA methylation biological clock.



FIG. 28 illustrates method for obtaining, training, and verifying a DNA methylation biological clock.





The elements in the figures are arranged in accordance with at least one of the embodiments described herein, and which arrangement may be modified in accordance with the disclosure provided herein by one of ordinary skill in the art.


DETAILED DESCRIPTION

In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.


Generally, the present invention relates to biomarkers of human biological aging. In some aspects, the invention relates to biomarkers based on gene expression, also called transcriptomic data, as well as DNA methylation profiles, which provide metrics and estimates of the biological age of organisms, including humans. In some aspects, the present invention relates to the biomarkers based on the proteins that are produced as the final products of the gene expression (e.g., proteomic data). Thus, transcriptome or proteome aging clocks are provided based on such biomarkers and use thereof. Additionally, machine learning and deep learning techniques are utilized to assess the transcriptomic data and/or proteomic data and the biomarkers of human biological aging. The invention provides methods that can be utilized to assess biological aging (e.g., computer methods performed on transcriptomic data and/or proteomic data of a subject), and then treat biological aging (e.g., therapeutic methods performed on subject). The invention includes methods, system, apparatus, computer program product, among others, to carry out the following.


In some embodiments, a method of creating a biological aging clock for a patient is provided. The method can include receiving a transcriptome signature derived from a patient cell, fluid, tissue or organ, which can be obtained by processing a biological sample to determine the transcriptome signature or DNA methylation profile, such as biomarkers thereof. Based on the transcriptome signature, the method can include providing input vectors to a machine learning platform. The machine learning platform processes the input vectors in order to generate output that includes a predicted or determined biological age of a sample, which thereby the biological age of the subject can be predicted or determined. In some aspects, the biological clock is specific to the cell, fluid, tissue or organ, or specific to a characteristic of the cell, fluid, tissue or organ, and thereby to the subject. In some aspects, the method can include repeating one or more of the steps (e.g., receiving transcriptomes signature and/or inputting the input vectors and/or generating output) for determining or creating a second biological aging clock, such as for the same subject, cell, fluid, organ or tissue, or a different subject, cell, fluid, organ or tissue. In some aspects, the two biological aging clocks are combined to create a synthetic biological aging clock that addresses biological aging at the fluid, tissue, organ, or organism level for the subject or more than one subject. In some aspects, the method can include repeating one or more of the steps a plurality of times to create a plurality biological aging clocks, such as for two or more organs in a subject, or for two or more subjects. In some aspects, the biological data (e.g., transcriptome, DNA methylation) signature and/or input vectors and/or generated output is derived from a non-senescent tissue or organ of the patient or another organism.


In some embodiments, a method of creating a biological aging clock for a patient is provided. The method can include receiving a proteome signature derived from a patient cell, fluid, tissue or organ, which can be obtained by processing a biological sample to determine the proteome signature, such as concentration of a set of proteins. Based on the proteome signature, the method can include providing input vectors to a machine learning platform. The machine learning platform processes the input vectors in order to generate output that includes a predicted or determined biological age of a sample, which thereby the biological age of the subject can be predicted or determined. In some aspects, the biological clock is specific to the cell, fluid, tissue or organ, or specific to a characteristic of the cell, fluid, tissue or organ. In some aspects, the method can include repeating one or more of the steps (e.g., receiving a transcriptomes and/or proteomes signature and/or inputting the input vectors and/or generating output) for determining or creating a second biological aging clock, such as for the same subject, cell, fluid, organ or tissue, or a different subject, cell, fluid, organ or tissue. In some aspects, the two biological aging clocks are combined to create a synthetic biological aging clock that addresses biological aging at the cell, fluid, tissue, organ, or organism level for the subject or more than one subject. In some aspects, the method can include repeating one or more of the steps a plurality of times to create a plurality biological aging clocks, such as for two or more organs in a subject, or for two or more subjects. In some aspects, the transcriptome signature and/or proteome signature and/or input vectors and/or generated output is derived from a non-senescent tissue or organ of the patient or another organism.


In some aspects, the machine learning platform comprises one or more deep neural networks. In some aspects, the machine learning platform comprises one or generative adversarial networks. In some aspects, the machine learning platform comprises an adversarial autoencoder architecture. In some aspects, the machine learning platform comprises a feature importance analysis for ranking genes or gene sets by their importance in age prediction.


In some aspects, a subset of the genes or gene sets are selected as targets for anti-aging therapies. This can be based on the transcriptome signature and/or proteome signature and/or input vectors and/or generated output. In some aspects, a subset of the genes or gene sets are selected as targets for aging rejuvenating therapies, where subsets of the proteins or protein sets correspond with the selected subset of the genes or gene sets.


In some aspects, the transcriptome and/or proteome signatures are based on signaling pathway activation signatures. In some aspects, the input transcriptome signatures profiles are derived from a microarray platform. In some aspects, the input transcriptome signatures profiles are derived from a RNA sequencing platform. In some aspects, the biological clock is specific to a cell, fluid, tissue or organ, or specific to a characteristic of the cell, fluid, tissue or organ. In some aspects, the input proteome signatures profiles are derived from antibody-based methods, ELISA, LC separation and MS data acquisition, SOMAscan protein assays, bicinchoninic acid based assays, Lowry protein assays and other biochemical assays, UV spectroscopic protein assays, the Bradford protein assay, colorimetric assays (including albumin colorimetric bromocresol assay) chemiluminescent protein with western blotting, amino acid analysis, gel electrophoresis, fluidity one method and any other protein concentration/expression measuring technique.


In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual. In some aspects, the method can include correlating a gene expression level and/or protein level (e.g., protein expression, protein concentration) or other biological data profile (e.g., DNA methylation) with a predicted biological age of the individual. In some aspects, the method an include correlating a signaling pathway signature with a predicted biological age of the individual. In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual, wherein the comparison further comprises a prognosis of the life expectancy. In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual, wherein the comparison further comprises a prognosis of the life expectancy and probability of survival of patient during treatment. In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual, wherein the comparison comprises an outcome measure of the efficacy of the therapies.


In some embodiments, a method can include developing a drug therapy based on the output. In some aspects, a method can include developing a senolytic therapy based on the generated output. In some aspects, a method can include developing a senoremdiation therapy based on the generated output.


In part, because the method includes one or more biomarkers of aging, it could be used to track the efficacy of the anti-aging therapies, such as senolytic therapy and senoremdiation therapies. The method can predicted the survival or life expectancy. Ant-aging drugs should increase life expectancy, and the methods can be used to track whether the administered drugs are increasing life expectancy (e.g., decreasing predicted age/make people younger, etc.).


In some aspects, a method can include developing an actuarial risk assessment of mortality, survival or morbidity based of an individual based on the generated output. In some aspects, a method can include developing an insurance assessment using mortality and survival analysis, existing health conditions and whether the applicant smoke based of an individual based on the generated output.


The invention also includes methods for creating a biological aging clock for a patient, the method comprising: (a) receiving a first transcriptome signature derived from a patient cell, fluid, tissue or organ; (b) receiving a second transcriptome signature derived from a baseline; and (c) computing a difference between predicted ages for the signature of (a) and the signature of (b).


The invention also includes methods for creating a biological aging clock for a patient, the method comprising: (a) receiving a first proteome signature derived from a patient cell, fluid, tissue or organ; (b) receiving a second proteome signature derived from a baseline; and (c) computing a difference between predicted ages for the signature of (a) and the signature of (b).


The invention also includes methods for creating a biological aging clock for a patient, the method comprising: (a) receiving a first DNA methylation signature derived from a patient cell, fluid, tissue or organ; (b) receiving a second DNA methylation signature derived from a baseline; and (c) computing a difference between predicted ages for the signature of (a) and the signature of (b).


In some aspects, the method can provide input vectors to a machine learning platform, wherein the machine learning platform outputs classification vectors that comprise components of a biological aging clock.


In some embodiments, a computer program product is provided on a tangible non-transitory computer readable medium that has a computer readable program code embodied therein, the program code being executable by a processor of a computer or computing system to perform a method for generating or determining a biological aging clock for a patient. Such a method can include receiving a transcriptome and/or proteome and/or DNA methylation signature derived from a patient cell, fluid, tissue or organ (Step (a)). The method can include creating input vectors based on the transcriptome and/or proteome and/or DNA methylation signature. The method can include providing input vectors to a machine learning platform (Step (b)). The method can include the machine learning platform generating output that includes a predicted biological age of a sample from the patient cell, fluid, tissue or organ (Step (c)). In some aspects, the biological aging clock is specific to the cell, fluid, tissue or organ or entire subject, or specific to a characteristic of the cell, fluid, tissue or organ or entire subject. In some aspects, the machine learning platform includes the examples and embodiments thereof described herein or known in the art. The biological aging clock can be considered a method that can be operated to predict the biological age of a tissue, organ, or subject, and then compare the predicted biological age with the actual age of the subject.


In some embodiments, the method performed by the computer program product can include repeating any Steps (a) (b) and (c) to create a second biological aging clock. In some aspects, the two or more biological aging clocks are combined to create a synthetic biological aging clock that addresses biological aging at the cell, fluid, tissue, organ, or organism level. In some aspects, the method can include repeating Steps (a) and (b) a plurality of times to create a plurality biological aging clocks. In some aspects, transcriptomic and/or proteomic and/or DNA methylation signature of Step (a) and/or the profile of Step (b) is derived from a non-senescent tissue or organ of the patient or another organism. In some aspects, a subset of the genes or gene sets are selected as targets for anti-aging therapies. In some aspects, a subset of the genes or gene sets are selected as targets for aging rejuvenating therapies. In some aspects, the transcriptome and/or proteome and/or DNA methylation signatures are based on signaling pathway activation signatures. In some aspects, the input transcriptome signatures profiles are derived from a microarray platform. In some aspects, the input transcriptome signatures profiles are derived from a RNA sequencing platform. In some aspects, the biological clock is specific to a cell, fluid, tissue or organ, or specific to a characteristic of the cell, fluid, tissue or organ.


The biological aging clocks have been developed using different methods/different tissues. In some instances, a biological aging clock can be developed using DNA methylation data or transcriptomic data extracted from blood profiles combined with clocked developed using biological data (e.g., proteomic data or DNA methylation data, etc.) from blood profiles, or a clock that was built for the skin tissues and blood. In the case of a ‘synthetic’ clock, you have a predicted biological age by multiple biological again clocks that combined.


In some instances, a biological aging clock can be developed using biological data (e.g., proteomic data, DNA methylation data, etc.) extracted from blood profiles combined with clocked developed using proteomic data from blood profiles, or a clock that was built for the skin tissues and blood. In the case of a ‘synthetic’ clock, you have a predicted biological age by multiple biological again clocks that combined.


In some embodiments, the method performed by the computer program product can include comparing a predicted biological age of an individual with an actual chronological age of the individual. In some aspects, the method can include correlating a biomarker profile (e.g., DNA methylation, gene expression and/or protein production level) with a predicted biological age of the individual. In some aspects, the method can include correlating a signaling pathway signature with a predicted biological age of the individual. In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual, wherein the comparison further comprises a prognosis of the life expectancy. In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual, wherein the comparison further comprises a prognosis of the life expectancy and probability of survival of patient during treatment. In some aspects, the method can include comparing a predicted biological age of an individual with an actual chronological age of the individual, wherein the comparison comprises an outcome measure of the efficacy of the therapies.


In some embodiments, the method performed by the computer program product can include developing a drug therapy based on the output. In some aspects, the method can include developing a senolytic therapy based on the output. In some aspects, the method can include developing a senoremdiation therapy based on the output. In some aspects, the method can include developing an actuarial assessment of an individual based on the output. In some aspects, the method can include developing a risk assessment based of an individual based on the output. In some aspects, the method can include developing an insurance assessment based of an individual based on the output.


In some embodiments, a method of creating a biological aging clock for a patient is provided Such a method can include: Step (a) receiving a first transcriptome signature and/or first proteome signature derived from a patient cell, fluid, tissue or organ; Step (b) receiving a second transcriptome signature and/or second proteome signature derived from a baseline; and Step (c) computing a difference between the signature of (a) and the signature of (b) (e.g., comparing transcriptome signatures and comparing proteome signatures) in order to determine input vectors. Step (d) can include inputting the input vectors into a machine learning platform. Step (e) can include prediction of age using the first transcriptome signature and/or first proteome signature (a) and signature of (b) in order to compare estimated age values. In some aspects, at least one of the transcriptome signatures and/or proteome signature is based on an in silico signaling pathway activation network decomposition, which is a decomposition performed with a machine learning platform, such as one described herein or otherwise known or created. In some aspects, the biological clock is specific to the cell, fluid, tissue or organ, or specific to a characteristic of the cell, fluid, tissue or organ. In some aspects, the method can include repeating any one or more of Step (a), Step (b), Step (c), Step (d), and/or Step (e) to create a second biological aging clock. In some aspects, the two biological aging clocks are combined to create a synthetic biological aging clock that addresses biological aging at the tissue, organ, or organism level. In some aspects, the method can include repeating any one or more of Step (a), Step (b), Step (c), Step (d), and/or Step (e) a plurality of times to create a plurality biological aging clocks. In some aspects, Step (a) and/or Step (b) is derived from a non-senescent tissue or organ of the patient or another organism, preferably Step (b). In some instances, a transcriptome biological aging clock is combined with a proteome biological aging clock. In some aspects, one type of biological data of a biomarker (e.g., transcriptome, proteome, DNA methylation, etc.) is substituted for the transcriptome or proteome biomarker data.


In some embodiments, a computer program product can include a tangible non-transitory computer readable medium having a computer readable program code stored therein, the program code being executable by a processor of a computer or computing system to perform a method for biological aging clock for a patient. The method can be a computational method as described herein. The computational method can include: (a) receiving data of a first transcriptome signature and/or first proteome signature derived from a patient cell, fluid, tissue or organ; (b) receiving data of a second t transcriptome signature and/or proteome signature derived from a baseline; and (c) computing a difference between the signature of Step (a) and the signature of Step (b) (e.g., comparing transcriptome to transcriptome or proteome to proteome). Step (c) can include computing a difference between the signature of (a) and the signature of (b) in order to determine input vectors. Step (d) can include inputting the input vectors into a machine learning platform. Step (e) can include causing the machine learning platform to generate output classification vectors that include components of a biological aging clock. In some aspects, at least one of the transcriptome signatures and/or proteome signature is based on an in silico signaling pathway activation network decomposition, which is a decomposition performed with a machine learning platform, such as one described herein or otherwise known or created. The computational method can include any other computing steps described herein. The biological clock can be specific to the cell, fluid, tissue or organ, or specific to a characteristic of the cell, fluid, tissue or organ. In some aspects, one type of biological data of a biomarker (e.g., transcriptome, proteome, DNA methylation, etc.) is substituted for the transcriptome or proteome biomarker data.


In some aspects, the computational method can include repeating any one or more of Step (a), Step (b), Step (c), Step (d), and/or Step (e) to create a second biological aging clock. In some aspects, the two biological aging clocks (e.g. DNA methylation, transcriptome, and/or proteome) are combined to create a synthetic biological aging clock that addresses biological aging at the tissue, organ, or organism level. In some aspects, the computational method can include repeating any one or more of Step (a), Step (b), Step (c), Step (d), and/or Step (e) a plurality of times to create a plurality biological aging clocks. In some aspects, Step (a) and/or Step (b) is derived from a non-senescent tissue or organ of the patient or another organism, preferably Step (b).


The present invention also relates to a multi-stage therapeutic for treating senescence (aging) of whole organisms (in particular, human individuals), as well as the organism's underlying cellular, tissue, and organ senescence. The present invention also relates to evaluation of efficacy of such therapeutic. Methods and systems for applying such therapeutic treatment, as well as informatics and other tools for developing the therapeutic treatments, are disclosed. Since disease and senescence are often associated, the invention is also applicable to treating disease. The therapeutic can be determined based on the biological clock that is determined in the methods described herein. The method for biological aging clock for a patient can also include using the output thereof, to determine a therapeutic.


The therapeutic can be the 5R strategy described herein.


The present disclosure provides compositions and methods for a 5R (Rescue, Remove, Replenish, Reinforce, Repeat) strategy for selectively rescuing pre-senescent cells, removing senescent cells, replenishing and reinforcing by new healthy cells and repeating the procedure wherein the composition comprises a group of senolytics and their derivatives thereof. The strategy of 5R may delay aging and/or treat age-related disorders especially fibrotic and senofibrotic disorders primarily in lungs and liver.


This 5R method may delay aging and/or treat age-related disorders especially fibrotic and senofibrotic disorders primarily in lungs, liver and skin. The 5R strategy as described is applied to patients with pre-senescent, senescent, and fibrotic conditions, among others. Drugs to be used include senoremediators, antifibrotic agents, and senolytics. The 5R approach will result in induction of regeneration. Drug repurposing strategy can be part of the therapy development process once the therapy protocols have been designed.



FIG. 1 shows an embodiment of an age predicting strategy, which is applied to patients with pre-senescent, senescent or age-related disease conditions. The following steps can be performed in any method described herein: 1. Single biopsy procedure; 2. Sample preparation and Microarray, RNA-seq profiles extraction; 3. Gene and gene sets annotations and expression values extraction; 4. Aging clock analysis; 5. Age prediction; 6. Repeat single biopsy procedure of tissues of individuals after a course of aging therapy; 7. Sample preparation Microarray, RNA-seq profiles extraction; 8. Gene and gene sets annotations and expression values extraction; 9. Repeat aging clock analysis; 10. Age prediction; and 11. Comparison of predicted age values before and after treatment. Any one of these steps may be performed alone or in combination of other steps as recited herein. In some instances, the methods can include obtaining data and processing the data to obtain a recommendation for a treatment protocol. The recommended treatment protocol can then be implemented on the patient in accordance with parameters of the treatment protocol. That is, without the computational generation of the treatment protocol, the aspects of the treatment protocol cannot be performed without the instructions to do so. As such, obtaining the instructions, such as the type of drug and/or natural product or specific drug and/or natural product or combination of drugs and/or natural product, can be vital for performing the treatment protocol. A similar age predicting strategy can use proteomic data.


In some instances, the treatment protocol can be obtained by steps 1, 2, 3, 4, and/or 5. Some of these steps may be omitted, such as steps 1, 2 when the sample is obtained already prepared. In some instances, the data from 2 may be obtained and provided into a computing system for step 3 and/or 4.


In some instances, there is a step 3a, wherein a determined treatment protocol is provided by step 3 and/or step 4, respectively. The determined treatment protocol can include a list of one or more drugs and natural product or treatment actions for each treatment step subsequent to steps 3 and/or 4.


The invention includes developing a personalized drug treatment.


The FIG. 2 illustrates the strategy of age prediction in case of personalized drug and/or natural product treatment, The following steps can be performed in any method described herein: 1. Single biopsy procedure; 2. Sample preparation and Microarray, RNA-seq profiles extraction; 3. Gene and gene sets annotations and expression values extraction; 4. Aging clock analysis; 5. Age prediction; 6. iPANDA analysis; 7. for personalized treatment protocol prediction; 8. Repeat single biopsy procedure of tissues of individuals after a course of aging therapy; 9. Sample preparation Microarray, RNA-seq profiles extraction; 10. Gene and gene sets annotations and expression values extraction; 9. Repeat aging clock analysis; 11. Age prediction; 12. Comparison of predicted age values before and after treatment. A similar age predicting strategy can use proteomic data.


The method of personalized treatment protocol prediction may include: (a) receiving a first transcriptome signature and/or first proteome signature derived from a patient cell, fluid, tissue or organ; (b) receiving a second transcriptome signature and/or second proteome signature derived from a baseline; (c) creating a difference matrix, such as in a computer with a model or neural network or machine learning, using the profile of (a) and the profile of (b); (d) receiving a cellular signature library; (e) receiving a drug therapeutic use library; (f) using the matrix of (c), the library of (d), and the library of (e) to provide input vectors to a machine learning platform, wherein the machine learning platform outputs classification vectors on one or more drugs, wherein the personalized drug treatment is comprised of the classification vectors.


The transcriptome signature and/or proteome and/or DNA methylation signature may be based on a signature signaling pathway activation network analysis on a computer. One of the transcriptome signatures and/or proteome and/or DNA methylation signatures is based on in silico signaling pathway activation network decomposition. One of the profiles may comprise a Pearson correlation matrix. The personalized drug treatment may comprise a senescence treatment for the patient. The profile of (b)—the second first transcriptome signature derived from a baseline—may be derived from a non-senescent tissue or organ of the patient or another subject. The method may include the machine learning platform comprising one or more deep neural networks. The method may include the machine learning platform comprising at least two generative adversarial networks and may comprise an adversarial autoencoder architecture. The personalized drug treatment may be created by prescribing drugs identified by the classification vectors at their lowest effective dose.


The invention includes a method of computationally, with a computer, designing a treatment protocol for a patient comprising one or more drugs, the method comprising: (a) identifying a gene expression signature of the patient; (b) defining a patient score for signatures taken from one or more patient tissues or organs; (c) selecting drugs based upon (a) and/or (b); and (d) defining a lowest effective combination for each drug. The method may include the gene expression signature being based on a signature signaling pathway activation network analysis, wherein gene expression signatures is based on an in silico signaling pathway activation network decomposition, wherein the gene expression signature comprises a transcriptome Pearson correlation matrix. The method can then include one or more treatment steps with one or more treatment drugs or treatment steps of any of the treatment methods described herein. In another aspect, protein expression signatures can be used instead of the gene expression signature or in addition thereto.


The protocol may be a senescence treatment for the patient. The method may include wherein: the gene expression signature and/or protein expression signature of the patient is derived, using a computer with appropriate algorithms or models (e.g., neural network) from a non-senescent tissue or organ of the patient or another subject, wherein (b) and (c) are carried out on a machine learning platform, wherein the machine learning platform comprises at least two generative adversarial networks, wherein the machine learning platform comprises an adversarial autoencoder architecture, wherein the machine learning platform comprises one or more deep neural networks. DNA methylation biomarker data can also be used.


In some embodiments, a computer program product can include a non-transitory computer readable medium having a computer readable program code embodied therein, the product being executable by a processor to perform a method for estimating the fractional gluconeogenesis of a patient, the method comprising developing a personalized drug treatment, comprising: (a) receiving a first transcriptome signature and/or first proteome signature derived from a patient cell, fluid, tissue or organ; (b) receiving a second transcriptome signature and/or second proteome signature derived from a baseline; (c) creating a difference matrix using the profile of (a) and the profile of (b); (d) receiving a cellular signature library; (e) receiving a drug therapeutic use library; (f) using the matrix of (c), the library of (d) and/or (e), to provide input vectors to a machine learning platform, wherein the machine learning platform outputs classification vectors on one or more drugs, wherein the personalized drug treatment is comprised of the classification vectors. In some aspects, one type of biological data of a biomarker (e.g., transcriptome, proteome, DNA methylation, etc.) is substituted for the transcriptome or proteome biomarker data.


A transcriptome signature and/or proteome signature representing tissue or organ senescence may be used to develop the biological aging clock, and then used to develop or identify at least one of the drugs used in the therapeutics described herein. The transcriptome signature and/or proteome signature may be a signaling pathway activation network analysis, which is performed on a computer with models as described herein. The transcriptome signature may be used in the following manner: as a signaling pathway activation network analysis, the transcriptome signature is used as input to a machine learning platform that outputs drug classifications. The transcriptome signature is compared to a baseline transcriptome signature that represents a less senescent version of the patient's cell, fluid, tissue or organ, and the transcriptome signature is compared to a baseline transcriptome signature that is constructed from more than one cell, fluid, tissue or organ transcriptome signature. A similar procedure can use the proteome instead of or in addition to the transcriptome. In some aspects, one type of biological data of a biomarker (e.g., transcriptome, proteome, DNA methylation, etc.) is substituted for the transcriptome or proteome biomarker data.


The computer processing can include input and or processing of a complete or partial schematic overview of the biochemistry of senescence. Additional information can be obtained in the incorporated provisional application regarding the biological pathways that can be uses as input and processing for determining a treatment, such as specific drugs for the treatment. Accordingly, the biological pathways can be used in the methods described herein. Such biological pathways are described herein with some examples of computer processing thereof for implanting the design of treatment protocols as recited herein.


A variety of cell-intrinsic and -extrinsic stresses that can activate the cellular senescence program can be used as input for a simulation or other computer processing. The biological pathways that are known, such as in the literature, can be analyzed for specific biological steps that are performed. Modulation of the biological step either to increase the activity or decrease the activity results in a cascading series of events in response to the modulated activity. The modulations can be with drugs, substances, of other affirmative actions that effect a modulation of the biological pathway. This modulation can be measured for a defined biological step. The biological step and the change in response to the modulation activity can be used as inputs into computer models, and such computer models can be trained on the data. Now, with the increase in artificial intelligence and deep learning algorithms, such biological steps, the modulation activity, and the changed response can be used with such computer models for modeling biological pathways. This can allow for determining a modulation activity for one or more biological steps. Such modulations activities can be real and based on the simulations, such as being a real drug, substance, or medical action. The output of the computer models can be instructions or other information for causing the modulation activity in order to obtain a specific type of biological step modulation so that the end goal of a specifically modulated biological pathway can be obtained. Accordingly, the biological pathways described herein, or in the incorporated references and provisional applications, can be used as the biological pathways for the treatment protocols described herein.


In a specific example, the biological pathways can relate to senescence, and the modulation thereof.


The biological pathways related to senescence can be used for computer models. Stressors are known to cause biological pathway modulation that results in senescence. For example, some stressors engage various cellular signaling cascades and can ultimately activate p53, p16Ink4a, or both. Some stress types that activate p53 through DDR signaling can be analyzed and computed. This can include computationally processing the ROS to elicit the DDR by perturbing gene transcription and DNA replication, as well as by shortening telomeres. The computer can also compute biological pathways of activated p53 that induces p21, which induces a temporal cell-cycle arrest by inhibiting cyclin E-Cdk2, which can be processed. The computer can also analyze how p16Ink4a also inhibits cell-cycle progression by targeting cyclin D-Cdk4 and cyclin D-Cdk6 complexes. Both p21 and p16Ink4a act by preventing the inactivation of Rb, thus resulting in continued repression of E2F target genes required for S-phase onset. Upon severe stress as modeled and computationally processed, temporally arrested cells that transition into a senescent growth arrest through a mechanism that is currently incompletely understood can be determined. Cells exposed to mild damage that can be successfully repaired may resume normal cell-cycle progression. On the other hand, cells exposed to moderate stress that is chronic in nature or that leaves permanent damage may resume proliferation through reliance on stress support pathways, and such information may be included in the data processing. This phenomenon (termed assisted cycling) is enabled by p53-mediated activation of p21, which can be taken into account when computationally determine a treatment, such as a drug treatment. Thus, the p53-p21 pathway can either antagonize or synergize with p16Ink4a in senescence depending on the type and level of stress that is used in the computational processing. BRAF(V600E) is unusual in that it establishes senescence through a metabolic effector pathway. BRAF(V600E) activates PDH by inducing PDP2 and inhibiting PDK1 expression, promoting a shift from glycolysis to oxidative phosphorylation that creates senescence-inducing redox stress, which can be taken into account in the computational processing. Cells undergoing senescence induce an inflammatory transcriptome regardless of the senescence inducing stress, and such inflammatory transcriptome can be considered in determining the treatment. Also, senescence-promoting and senescence-preventing activities may be computed, and may be weighted relative to their importance. A senescence-reversing mechanism may be input or modeled or otherwise computed as part of the process.


A multi-step senescence model can also be input and computed. The model can be programmed to consider cellular senescence as a dynamic process driven by epigenetic and genetic changes. An initial step computes the progression from a transient to a stable cell-cycle arrest through analysis of a sustained activation of the p16Ink4a and/or p53-p21 pathways. The model can consider the resulting early senescent cells progress to full senescence by downregulating lamin B1, thereby triggering extensive chromatin remodeling underlying the production of a SASP. The model can consider certain components of the SASP that are highly conserved, whereas others may vary depending on cell type, nature of the senescence-inducing stressor, or cell-to-cell variability in chromatin remodeling. The computation can consider progression to deep or late senescence that may be driven by additional genetic and epigenetic changes, which can be computed, including chromatin budding, histone proteolysis and retrotransposition, driving further transcriptional change and SASP heterogeneity. The computation can consider the efficiency with which immune cells dispose of senescent cells, and which may be dependent on the composition of the SASP. The proinflammatory signature of the SASP can fade due to expression of particular microRNAs late into the senescence program, thereby perhaps allowing evasion of immuno-clearance, which can also be considered.


In some embodiments, a conceptual model can be computed in which senescent cells are subdivided into two main classes based on kinetics of senescence induction and functionality. The conceptual model can consider that acute senescence is induced through cell-extrinsic stimuli that target a specific population of cells in the tissue. Acute senescent cells self-organize their elimination through SASP components that attract various types of immune cells. The conceptual model can be programmed to consider that induction of chronic senescence occurs after periods of progressive cellular stress or macromolecular damage when tarry cycling transitions into a stable cell-cycle arrest. The conceptual model can consider that age-related immunodeficiency or production of less proinflammatory SASPs, immune cells may inefficiently eliminate chronic senescent cells, allowing continuation of multi-step senescence. For example, the conceptual model may consider that senescence induced during cancer therapy may initially be acute and later chronic in nature.


The computer models can be programed and receive senescence input data for computing how senescence promotes age-related tissue dysfunction. Senescence contributes to the overall decline in tissue regenerative potential that occurs with ageing. The computer models can be programed with the observation that progenitor cell populations in both skeletal muscle and fat tissue of BubR1 progeroid mice are highly prone to cellular senescence. Proteases chronically secreted by senescent cells may perturb tissue structure and organization by cleaving membrane-bound receptors, signaling ligands, extracellular matrix proteins or other components in the tissue microenvironment, which can affect the treatment protocols described herein. In addition, other SASP components, including IL-6 and IL-8, may stimulate tissue fibrosis in certain epithelial tissues by inducing EMT may be considered. Chronic tissue inflammation, which is characterized by infiltration of macrophages and lymphocytes, fibrosis and cell death, is associated with ageing and has a causal role in the development of various age-related diseases, which can be considered during identifying a treatment.


The matrix metalloproteinases and proinflammatory SASP components can be modeled and considered in determining a treatment because of their ability create a tissue microenvironment that promotes survival, proliferation and dissemination of neoplastic cells. The model can be processed so that SASP can be modeled for increasing age-related tissue deterioration through paracrine senescence, where senescent cells spread the senescence phenotype to healthy neighboring cells through secretion of IL-1b, TGFb and certain chemokine ligands. With gene expression analysis or pathway analysis it is possible to distinguish between pre-senescent and senescent cells signatures with the computations.


The models can be computed to consider that killing senescent cells can lead to rejuvenation of the tissue. For example, a modified FOXO4-p53 interfering peptide can be considered that causes p53 and induces targeted apoptosis of senescent cells (TASC), which neutralizes murine liver chemotoxicity from doxorubicin treatment. The TASC can be considered for restoring fitness, hair density, and renal function in fast and naturally aged mice.


The model can be processed so that delaying senescence or even promote death of accumulating apoptosis-resistant senescent cells can be a strategy to prevent age related diseases. Tocotrienols (T3s) and quercetin (Q) can be input for modeling as senolytics agents (e.g., small molecules that can selectively induce death of senescent cells). Both drugs are able to kill pre-senescent and senescent cells and can be used adjuvant therapy of cancer and preventive anti-aging strategies, and thereby can be used in the treatments herein.


The computational models can also consider fibrosis and senofibrosis conditions. The term fibrosis describes the development of fibrous connective tissue as a reparative response to injury or damage, which can be considered during computing for treatment protocols. Fibrosis may refer to the connective tissue deposition that occurs as part of normal healing or to the excess tissue deposition that occurs as a pathological process. The term senofibrosis describes the development of fibrous connective tissue under influence of senescent cells, which can be considered during computing for treatment protocols. Senescent activated cells lose their proliferative and collagen-producing capacity and have increased inflammatory property to produce inflammatory cytokines compared with replicating activated “normal” cells. The computational models can focus on two types of fibrosis and senofibrosis treatment: pulmonary (IPF) and liver.


The models can be processed to consider that fibrosis is a wound healing response that produces and deposits extracellular matrix (ECM) proteins including collagen fibers, causing tissue scarring. Liver usually regenerates after liver injury. However, when liver injury and inflammation are persistent and progressive, liver cannot regenerate normally and causes fibrosis. Hepatic stellate cells (HSCs) are the primary source of activated myofibroblasts that produce extracellular matrix in the liver. Progressive liver fibrosis results in cirrhosis where liver cells cannot function properly due to the formation of fibrous scar and regenerative nodules and the decreased blood supply to the liver. The model can perform such simulations. The model can consider three main reasons for liver fibrosis: alcoholic fatty diseases; non-alcoholic fatty diseases; and viral hepatitis. In each case different mechanisms lead to fibrotic tissue formation, which mechanisms can be processed to determine a suitable protocol.


The model can also consider that quiescent HSCs store Vitamin A-containing lipid droplets, and HSCs lose lipid droplets when they are activated. Transforming growth factor (TGF)-β and platelet-derived growth factor (PDGF) are two major cytokines that contribute to HSC activation and proliferation, resulting in activation into myofibroblasts. Many other cytokines, intracellular signaling, and transcription factors are involved in this process, and may be considered during computations.


The computational models can also consider activation and regression of hepatic stellate cells. Quiescent hepatic stellate cells (HSCs) store Vitamin A containing lipid droplets and lose Vitamin A when the cells are activated. Hepatic epithelial injury, such as death of hepatocytes and biliary epithelial cells, induces activation of HSCs directly or through cytokines released from immune cells including Kupffer cells, bone marrow-derived monocytes, Th17 cells, and innate lymphoid cells (ILC). Transforming growth factor-f (TGF-f), platelet-derived growth factor (PDGF), interleukin-1f (IL-1f), IL-17, and intestine-derived lipopolysaccharide (LPS) promote HSC activation. IL-33 promotes HSC activation through ILC2. Autophagy in HSCs is associated with HSC activation. The activated myofibroblast pool is mainly constituted by activated HSCs, but biliary injury induces differentiation of portal fibroblasts to activated myofibroblasts. However, there is no evidence of epithelial-mesenchymal transition for constituting the myofibroblast pool. After the cessation of causative liver injury, fibrosis starts regression, and activated HSCs induce apoptosis or revert into a quiescent state. Peroxisome proliferator-activated receptor 7 (PPAR7) expression in HSCs is associated with HSC reversal. Some activated HSCs become senescent, resulting in loss of profibrogenic property in which p53 plays a role. Moreover, angiogenesis contributes to both fibrosis development and regression. As such, each may be considered when computing a therapeutic protocol.


The main pathways that are involved in modulation of hepatic inflammation can be categorized as (1) Upregulated and (2) Downregulated. The main pathways that are involved in formation of cellular senescence in HSCs can be categorized as (1) Upregulated and (2) Downregulated. Both upregulation and downregulation of any biological pathway, such as those described herein, may be considered during the computation of therapeutic protocols.


The main pathways which are involved in formation of cellular senescence phenotype in primary human hepatocytes (PHH). Data for the analysis is taken from LINCs transcriptomic dataset and computed as described herein. Methanesulfonate is a DNA damage/senescence inducer, which may be used in obtaining data to train the models. Liver senescence and liver fibrosis signatures hold the common features on the pathway level (analysis is based on the gene expression data using iPANDA, as described further below.


The main pathways which are involved in formation of cellular senescence phenotype in primary human hepatocytes (PHH). Data for the analysis, and model computations for determining a therapeutic protocol can be taken from LINCs transcriptomic dataset. The following are Up-regulated: BRCA1 Pathway Homologous Recombination Repair; JNK Pathway Insulin Signaling; Caspase Cascade Pathway Activated Tissue Trans-glutaminase; JNK Pathway Gene Expression Apoptosis Inflammation Tumorigenesis Cell Migration via SMAD4, STAT4, HSF1, TP53, MAP2, DCX, ATF2, NFATC3, SPIRE1, MAP1B, TCF15, ELK1, BCL2, JUN, PXN, and NFATC2; Caspase Cascade Pathway DNA Fragmentation; TRAF Pathway Gene Expression via FOS and JUN; IF1Alpha Pathway Gene Expression via JUN and CREB3; TNF Signaling Pathway Apoptosis; PTEN Pathway Genomic Stability; VEGF Pathway Gene Expression and Cell Proliferation via MAPK7; ErbB Family Pathway Gene Expression via JUN, FOS, and ELK1; PTEN Pathway Ca2+ Signaling; PTEN Pathway DNA Repair; VEGF Pathway Prostaglandin Production; MAPK Family Pathway Gene Expression via ATF2, JUN, ELK1, NFKB2, and CREB3; HIF1Alpha Pathway; WNT Pathway; ATM Pathway Cell Survival; and MAPK Family Pathway Translation. The following are Down-regulated: Ras Pathway Increased T-cell Adhesion; HGF Pathway Cell Adhesion and Cell Migration; IGF1R Signaling Pathway Cell Migration; ILK Signaling Pathway Cell Migration Retraction; ILK Signaling Pathway Cell Cycle Proliferation; ILK Signaling Pathway G2 Phase Arrest; ILK Signaling Pathway Cytoskeletal Adhesion Complexes; ILK Signaling Pathway Loss of Occludin Barrier Dysfunction; ATM Pathway Cell Cycle Checkpoint Control; Akt Signaling Pathway AR mediated apoptosis; Akt Signaling Pathway Apoptosis; Akt Signaling Pathway Cell Cycle Progression; and Akt Signaling Pathway Elevation of Glucose Import. The role of senescence of HSCs in liver fibrosis may be computed, and experimental data using cell-specific genetic modifications to HSCs from experimental models of liver fibrosis in vivo can be used in the computation of treatment protocols.


There is no treatment for liver fibrosis still. The only way to avoid it is to prevent massive inflammation by rescuing or killing pre-senescent and senescent cells accordingly. Liver senescence and liver fibrosis signatures hold the common features on the pathway level (analysis is based on the gene expression data using iPANDA package). The common significant pathways involved into modulation liver fibrosis (and cirrhosis) are that can be considered in the computation models include the following upregulated and down regulated pathways. Those upregulated include: ILK Signaling Pathway Opsonization; ILK Signaling Pathway Cell Adhesion; ILK Signaling Pathway Wound Healing; Akt Signaling Pathway AR mediated apoptosis; TRAF Pathway; IL-10 Pathway Stability Determination; EGF Pathway Rab5 Regulation Pathway; TRAF Pathway Gene Expression via FOS and JUN; ILK Signaling Pathway Tumor Angiogenesis; Akt Signaling Pathway NF-kB dependent transcription; HIF1Alpha Pathway Gene Expression via JUN and CREB3; Chemokine Pathway; STAT3 Pathway Growth Arrest and Differentiation; TRAF Pathway Apoptosis; Erythropoietin Pathway GPI Hidrolysis and Ca2+ influx; IL-10 Pathway; IL-10 Pathway Inflammatory Cytokine Genes Expression via STAT3; ILK Signaling Pathway MMP2 MMP9 Gene Expression Tissue Invasion via FOS; ErbB Family Pathway Gene Expression via JUN, FOS, and ELK1; Akt Signaling Pathway Regulation of Na+ Transport; PAK Pathway Paxillin Disassembly; ILK Signaling Pathway Cytoskeletal Adhesion Complexes; cAMP Pathway Glycogen Synthesis; and ILK Signaling Pathway Cell Migration Retraction. Those downregulated include: STAT3 Pathway Anti-Apoptosis; Akt Signaling Pathway Cell Cycle Progression; Circadian Pathway; Growth Hormone Signaling Pathway Protein Synthesis; and PTEN Pathway Migration.


The common significant pathways involved in formation of cellular senescence and liver fibrosis that can be computed include those that are upregulated and downregulated. Those upregulated include: ErbB Family Pathway Gene Expression via JUN, FOS, and ELK1; HIF1Alpha Pathway Gene Expression via JUN and CREB3; and TRAF Pathway Gene Expression via FOS and JUN. Those downregulated include Akt Signaling Pathway Cell Cycle Progression. The common significant pathways involved into modulation of IPF include those upregulated or downregulated. Those upregulated include: Cellular Apoptosis Pathway; KEGG Choline metabolism in cancer Main Pathway; KEGG Prostate cancer Main Pathway; NCI CXCR4 mediated signaling events Main Pathway; NCI Syndecan 4 mediated signaling events Main Pathway; NCI TRAIL signaling Main Pathway; NCI Validated transcriptional targets of deltaNp63 isoforms Main Pathway; NCI Validated transcriptional targets of deltaNp63 isoforms Pathway (Pathway degradation of TP63); PTEN Pathway Adhesion or Migration; PTEN Pathway Angiogenesis and Tumorigenesis; PTEN Pathway Ca2+ Signaling; reactome Collagen biosynthesis and modifying enzymes Main Pathway; and reactome SMAD2, SMAD3, and SMAD4, heterotrimer regulates transcription Main Pathway. Those downregulated include: Growth Hormone Signaling Pathway Gene Expression via SRF, ELK1, STAT5B, CEBPD, STAT1, STAT3; and reactome Tie2 Signaling Main Pathway.


The common significant pathways involved in formation of cellular senescence in lung tissue can include those upregulated and downregulated. Those upregulated include: Growth Hormone Signaling Pathway Gene Expression via SRF, ELK1, STAT5B, CEBPD, STAT1, STAT3; KEGG Choline metabolism in cancer Main Pathway; KEGG Prostate cancer Main Pathway; NCI CXCR4 mediated signaling events Main Pathway; NCI TRAIL signaling Main Pathway; PTEN Pathway Adhesion or Migration; PTEN Pathway Angiogenesis and Tumorigenesis; PTEN Pathway Ca2+ Signaling; reactome Collagen biosynthesis and modifying enzymes Main Pathway; reactome SMAD2, SMAD3, SMAD4 heterotrimer regulates transcription Main Pathway; and reactome Tie2 Signaling Main Pathway. Those downregulated include: Cellular Apoptosis Pathway; NCI Syndecan 4 mediated signaling events Main Pathway; NCI Validated transcriptional targets of deltaNp63 isoforms Main; Pathway; NCI Validated transcriptional targets of deltaNp63 isoforms Pathway (Pathway degradation of TP63).


Cellular senescence can contribute to accelerating organ aging, and, among the pulmonary diseases that can be related to pulmonary senescence, chronic obstructive pulmonary disease/emphysema (COPD) and idiopathic pulmonary fibrosis (IPF), are the most common and lethal. COPD and IPF are severe multifactorial pulmonary disorders characterized by distinct clinical and pathologic features (“Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease: GOLD Executive Summary Updated 2003” 2004; Noble et al. 2011). The date regarding clinical and pathological features can be used in the computational models that are processed for determining the therapeutic protocols.


In all known types of cellular senescence, including replicative cellular senescence, stress-induced senescence, and oncogene-induced senescence, a permanent state of cell cycle arrest occurs that is mediated by the expression of p16INK4a and p21WAF1, 2 cell cycle inhibitors that are also well-recognized markers, to investigate this mechanism in vivo (Kim and Sharpless 2006; Campisi 2005; Mallette and Ferbeyre 2007; Ohtani et al. 2004; Takeuchi et al. 2010). Altered expression of p16INK4a, p21WAF1, and b-galactosidase (a widely used histochemical marker of cellular senescence) have been demonstrated in IPF (Minagawa et al. 2010; Kuwano et al. 1996; Lomas et al. 2012). These markers are expressed strongly at sites of alveolar damage and hyperplasia, as well as in fibroblast foci localized in the discrete clusters of bronchiolar basal cells coexpressing the laminin-5-g2 chain (LAM5g2) and heat shock protein 27 (Hsp27) (Chilosi et al. 2006). According to review (Chilosi et al. 2013) several factors lead to senescence in lungs, they are different for two types: idiopathic pulmonary fibrosis and chronic obstructive pulmonary disease/emphysema pathogenesis. This information may also be used in the computational models for determining therapeutic protocols.


It should be recognized that the methods described herein may be performed with DNA methylation and/or proteomic data in addition to or instead of transcriptomic data.


Methods for development of senescence drug treatments, that is, the selection of drugs, dosages, and cycles, are described herein. In this section, we give an overview of the drug treatments, themselves, that is, application of the personalized treatments once they have been designed, in a preferred embodiment, to the patient. In that patient, a tissue or organ is identified to which the senescent treatment will be applied.


In a preferred embodiment, one phase of the treatment involves senoremediation, that is, a drug protocol of senoremediators, which are drugs that restore or increase the amount of presenescent cells (cells that are typical or a young, healthy tissue or organ). Another phase of the treatment involves senolytic treatment, that is, a drug protocol that involves restoring or that involves elimination or destruction of senescent cells in the tissue or organ of interest.


In another preferred embodiment, there is also an antifibrotic phase, that is, a drug protocol that addressing fibrotic cells in the tissue or organ of interest. Antifibrotic may involve restoring senescent cells to a pre-senescent, non-fibrotic state, elimination or destruction of fibrotic cells, or both.


Since such drug treatment protocols are highly specific, and based upon the classification vectors of the analyses described herein, they may take many forms. Methods in the art, such as Seim et. al., “Gene expression signatures of human cell and tissue longevity”, npj Aging and Mechanisms of Disease, 2, 16014 (2016), addresses transcriptome changes/differences associated with senescence that are used to classify drug protocols.


To examine gene expression strategies that support the lifespan of different cell types within the human body, one can obtain available RNA-seq data sets and interrogated transcriptomes of various somatic cell types and tissues with reported cellular turnover, along with an estimate of lifespan, ranging from 2 days (monocytes) to effectively a lifetime (neurons). Across different cell lineages, one can obtain a gene expression signature of human cell and tissue turnover. In particular, turnover showed a negative correlation with the energetically costly cell cycle and factors supporting genome stability, concomitant risk factors for aging-associated pathologies. Similar protocols can be performed with proteomic data.


Comparative transcriptome studies of long-lived and short-lived mammals, and analyses that examined the longevity trait across a large group of mammals (tissue-by-tissue surveys, focusing on brain, liver and kidney), have revealed candidate longevity-associated processes. Publicly available transcriptome data sets (for example, RNA-seq) generated by consortia, such as the Human Protein Atlas (HPA), or by The Genotype-Tissue Expression (GTEx) project or The Cancer Genome Atlas (TCGA) program can be used. Or protein expression and concentration datasets provided by The Cancer Genome Atlas (TCGA) program or biobank datasets, such as blood protein tests, including such biobank as UK biobank or Framingham Heart Study. They offer an opportunity to understand how gene expression and/or protein expression programs are related to cellular turnover, as a proxy for cellular lifespan. Gene expression and/or protein expression patterns are typically analyzed, in a preferred embodiment, using Principal Component Analysis (PCA), as a first step.


The present invention involves examining an aging transcriptome and/or proteome in which the transcribed genes and/or translated proteins in old to young people are compared to define a set first of genes which are more strongly expressed (activated) in old people relative to young people and a second set of genes (repressed) which are less strongly expressed in old people relative to young people. A preferred embodiment is herein described.


A rating approach can be used to rank the senescence treating properties of treatments first involves collecting the transcriptome datasets from young and old patients and normalizing the data for each cell and tissue type, evaluating the pathway activation strength (PAS) for each individual pathway and constructing the pathway cloud and screen for drugs or combinations that minimize the signaling pathway cloud disturbance by acting on one or multiple elements of the pathway cloud. Drugs and combinations may be rated by their ability to return the signaling pathway activation pattern closer to that of the younger tissue samples. The predictions may be then tested both in vitro and in vivo on human cells and on model organisms such as rodents, nematodes and flies to validate the screening and rating algorithms Similar protocols can be performed with proteomic data.


In a preferred embodiment of the senescence treatment, a method for ranking drugs, the method including; a. collecting young subject transcriptome data and old subject transcriptome data for one species to evaluate pathway activation strength (PAS) and down-regulation strength for a plurality of biological pathways; b. mapping the plurality of biological pathways for the activation strength and down-regulation strength from old subject samples relative to young subject samples to form a pathway cloud map; and c. providing a rating for each of a plurality of drugs in accordance with a drug rating for minimizing signaling pathway cloud disturbance (SPCD) in the pathway cloud map of the one species to provide a ranking of the drugs. Similar protocols can be performed with proteomic data.


Pathway Activation and Pathway Activation Network Decomposition Analysis (iPANDA), is a preferred method of network analysis for the methods described herein. While gene expression data is described, it is clear to one of skill in the art that proteomic data may also be used. Thus, the protocols may apply to transcriptomic and/or proteomic data.


Development of senescence treatments (in particular drug combinations and protocols) as contemplated by the authors, are particularly compatible with the signaling pathway activation network analysis as described, for example, in U.S. 62/401,789 (Ozerov, filed September 2016, now US 2018-0125865) and Ozerov et. al., “In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development”, Nature Communications, 7: 13427, 2016, and both incorporated by specific reference in their entity. Such methods include large-scale transcriptomic data analysis that involves insilico Pathway Activation Network Decomposition Analysis (iPANDA). The capabilities of this method apply to multiple data sets containing data on obtained, for example, from Gene Expression Omnibus (GEO). Data sets in GEO are accessed by identifier, or accession number, such as GSE5350.


Additionally, according to an embodiment of the present invention, the pathway cloud map shows at least one upregulated/activated pathway and at least one down-regulated pathway of the old subject relative to the young subject. Furthermore, according to an embodiment of the present invention, the pathway cloud map is based on a plurality of young subjects and a plurality of old subjects. Importantly, according to an embodiment of the present invention, the method is performed for an individual to determine an optimized ranking of drugs for the individual.


Further, according to an embodiment of the present invention, the samples or biopsies are bodily samples selected from one or more of a blood sample, a urine sample, a biopsy, a hair sample, a nail sample, a breath sample, a saliva sample, or a skin sample.


Yet further, according to an embodiment of the present invention, the pathway activation strength is calculated by dividing the expression levels for a gene n in the old subject samples by the gene expression levels of the young subject samples.


Additionally, according to an embodiment of the present invention, the pathway activation strength is calculated in accordance with






SO
=





i
=
1

N








[
AGEL
]

i






j
=
1

M








[
RGEL
]

j







The [RGEL]i is an activator gene expression level and [RGEL]j is a repressor gene expression level) are expression level of activators gene i and j, respectively.


Yet further, according to an embodiment of the present invention, to drugs or combinations that minimize the signaling pathway cloud disturbance (SPCD). Additionally, according to an embodiment of the present invention, the SPCD is a ratio of [AGEL]i, which is the activator gene #i expression level, to [RGEL]j, which is the repressor gene #j expression level, and wherein this is calculated for activator and repressor proteins in the pathway.


Cellular Network Analysis and iPANDA


There are well known method in the art (see, for example, U.S. Pat. No. 8,623,592) for treating patients with methods for predicting responses of cells to treatment with therapeutic agents. These methods involve measuring, in a sample of the cells, levels of one or more components of a cellular network and then computing a Network Activation State (NAS) or a Network Inhibition State (NIS) for the cells using a computational model of the cellular network. The response of the cells to treatment is then predicted based on the NAS or NIS value that has been computed. The present invention also comprises predictive methods for cellular responsiveness in which computation of a NAS or NIS value for the cells (e.g., senescent cells) is combined with use of a statistical classification algorithm. A preferred method of iPANDA implementation is now described. The method of transcriptomic data analysis, typically includes receiving cell transcriptomic data of a control group (C) and cell transcriptomic data (S) of group under study for a gene, calculating a fold change ratio (fc) for the gene, repeating steps a and b for a plurality of genes, grouping co-expressed genes into modules, estimating gene importance factors based on a network topology, mapped from a plurality of the modules, in order to obtain an in silico Pathway Activation Network Decomposition Analysis (iPANDA) value, the iPANDA value having a Pearson coefficient greater than a Pearson coefficient associated with another platform for manipulating the control cell transcriptomic data and the cell transcriptomic data of group under study for the plurality of genes. Steps may also include determining a biological an in silico Pathway Activation Network Decomposition Analysis (iPANDA) associated with at least one of the above the module, providing a classifier for treatment response prediction of a drug to a disease, wherein the disease is selected from a senescence and another disease or disorder, applying at least one statistical filtering test and a statistical threshold test to the fc values, obtaining proliferative bodily samples and healthy bodily samples from patients, applying the drug to the patients, determining responder and non-responder patients to the drug. The method also often includes comparing gene expression in at least one of selected signaling pathways and metabolic pathways, often associated with a drug.


One of the most relevant challenges in transcriptomic data analysis is the inherent complexity of gene network interactions, which remains a significant obstacle in building comprehensive predictive models. Moreover, high diversity of experimental platforms and inconsistency of the data coming from the various types of equipment—may also lead to the incorrect interpretation of the underlying biological processes. Although a number of data normalization approaches have been proposed over the recent years it remains difficult to achieve robust results over a group of independent data sets even when they are obtained from the same profiling platform. This may be explained by a range of biological factors, such as wide heterogeneity among individuals on the population basis, variance in the cell cycle stage of the cells used or a set of technical factors, such as sample preparation or batch variations in reagents.


A preferred embodiment of the present invention is compatible with the large-scale transcriptomic data analysis called in silico Pathway Activation Network Decomposition Analysis (iPANDA) as described herein. iPANDA is an effective tool for biologically relevant dimension reduction in transcriptomic data.


Overview of a Preferred iPANDA Embodiment


Fold changes between the gene expression levels in the samples under investigation and an average expression level of samples within the normal set is used as input data for the iPANDA algorithm. Since some genes may have a stronger effect on the pathway activation than others, the gene importance factor has been introduced. Several approaches of gene importance hierarchy calculation have been proposed during the last few decades. The vast majority of these approaches aim to enrich pathway-based models with specific gene markers most relevant for a given study. While some of them use detailed kinetic models of several particular metabolic networks to derive importance factors, in others, gene importance is derived from the statistical analysis of the gene expression data obtained for disease cases and healthy samples.


The iPANDA approach integrates different analytical concepts described above into a single network model as it simultaneously exploits statistical and topological weights for gene importance estimation. The smooth threshold based on the P values from a t-test performed on groups of two contrasting tissue samples is applied to the gene expression values. The smooth threshold is defined as a continuous function of P value ranging from 0 to 1. The statistical weights for genes are also derived during this procedure. The topological weights for genes are obtained during the pathway map decomposition. The topological weight of each gene is proportional to the number of independent paths through the pathway gene network represented as a directed graph.


It is well known that multiple genes exhibit considerable correlations in their expression levels. Most algorithms for pathway analysis treat gene expression levels as independent variables, which, despite the common belief, is not suitable when the topology-based coefficients are applied. Indeed, due to exchangeability, there is no dependence of pathway activation values on how the topology weights are distributed over a set of coexpressed genes with correlated expression levels, and hence correlated fold changes. Thus, the computation of topological coefficients for a set of coexpressed genes is inefficient, unless a group of coexpressed genes is being considered as a single unit. To circumvent this challenge, gene modules reflecting the coexpression of genes are introduced in the iPANDA algorithm. The wide database of gene coexpression in human samples, COEXPRESdb, and the database of the downstream genes controlled by various transcriptional factors are utilized for grouping genes into modules. In this way, the topological coefficients are estimated for each gene module as a whole rather than for individual genes inside the module


The contribution of gene units (including gene modules and individual genes) to pathway activation is computed as a product of their fold changes in logarithmic scale, topological and statistical weights. Then the contributions are multiplied by a discrete coefficient which equals to −1 or +1 in the case of pathway activation or suppression by the particular unit, respectively. Finally, the activation scores, which we refer to as iPANDA values, are obtained as a linear combination of the scores calculated for gene units that contribute to the pathway activation/suppression. Therefore, the iPANDA values represent the signed scores showing the intensity and direction of pathway activation.


Pathway Quality Metrics and iPANDA


Although currently there are several publicly available pipelines for benchmarking the transcriptomic data analysis algorithms, our aim is to generalize the approaches for pathway-based algorithm testing and reveal the common features of reliable pathway-based expression data analysis. We term these features “pathway analysis quality hallmarks”. Efficient methods for pathway-based transcriptomic data analysis should be capable to perform a significant noise reduction in the input data and aggregate output data as a small number of highly informative features (pathway markers).


Scalability (the ability to process pathways with small or large numbers of genes similarly) is another critical aspect that should be considered when designing a reliable pathway analysis approach, since pathway activation values for pathways of different sizes should be equally credible. The list of pathway markers identified should be relevant to the specific phenotype or medical condition, and robust over multiple data sets related to the process or biological state under investigation. The calculation time should be reasonable to allow high-throughput screening of large transcriptomic data sets. To address the iPANDA algorithm in respect to these hallmarks and to fully assess its true potential and limitations, we have directly compared the results obtained by iPANDA using the tissue and Microarray Analysis Quality Control (MAQC)-I data sets with five other widely used third-party viable alternatives (GSEA8, SPIA9, Pathway Level Analysis of Gene Expression (PLAGE) 26, single sample Gene Set Enrichment Analysis (ssGSEA) and Denoising Algorithm based on Relevant network Topology (DART)).


iPANDA as a Tool for Noise Reduction in Transcriptomic Data


One of the major issues that should be addressed when developing a novel transcriptomic data analysis algorithm is the ability of the proposed method to reduce noise while retaining the biologically relevant information of the results. Since pathway-based analysis algorithms are considered dimension reduction techniques, the pathway activation scores should represent collective variables describing only biologically significant changes in the gene expression profile.


In order to estimate the ability of the iPANDA algorithm to perform noise reduction while preserving biologically relevant features, we performed an analysis of the well-known MAQC data set (GEO identifier GSE5350). It contains data for the same cell samples processed using various transcriptome profiling platforms. A satisfactory pathway or network analysis algorithm should reduce the noise level and demonstrate a higher degree of similarity between the samples in comparison to the similarity calculated using gene set data.


To estimate gene level similarity only fold changes for differentially expressed genes (t-test P value<0.05) were utilized. Pearson correlation is chosen as a metric to measure the similarity between samples. Sample-wise correlation coefficients were obtained for the same samples profiled on Affymetrix and Agilent platforms. Similar procedure is performed using pathway activation values (iPANDA values).


Notably, the similarity calculated using pathway activation values generated by the iPANDA algorithm significantly exceeds the one calculated using fold changes for the differentially expressed genes (mean sample-wise correlation is over 0.88 and 0.79, respectively). To further validate our algorithm, we directly compared its noise reduction efficacy with that of other routinely used methods for transcriptome-based pathway analysis, such as SPIA, GSEA, ssGSEA, PLAGE and DART.


The mean sample-wise correlation between platforms is 0.88 for iPANDA compared with 0.53 for GSEA, 0.84 for SPIA, 0.69 for ssGSEA, 0.67 for PLAGE and 0.41 for DART. Furthermore, the sample-wise correlation distribution obtained using iPANDA values is narrowed to a range of 0.79 to 0.94, compared with −0.08-0.80, 0.60-0.92, 0.61-0.74, 0.45-0.75 and −0.11-0.60 for GSEA, SPIA, ssGSEA, PLAGE and DART, respectively.


In a preferred embodiment, iPANDA does generally assign more weights to genes that tend to be reliably coexpressed using information from COEXPRESSdb database. The information from COEXPRESSdb is utilized solely for grouping genes into modules, and hence cannot introduce any favorable bias towards iPANDA in this assessment. Even when the feature for grouping genes into modules is ‘switched off’, meaning that all genes are considered individually and no information from COEXPRESSdb is being utilized, iPANDA scores show higher sample-wise similarity between data obtained using various profiling platforms compared with the similarity calculated on the gene level.


Biomarker Identification and Relevance and iPANDA


As a next step we address the iPANDA ability to identify potential biomarkers (or pathway markers) of the phenotype under investigation. One of the commonly used methods to assess the capability of transcriptomic pathway markers to distinguish between two groups of samples (for example, resistance and sensitivity to treatment) is to measure their receiver operating characteristics area under curve (AUC) values. The capacity to generate a high number of biomarkers with high AUC values is a major requirement for any prospective transcriptomic data analysis algorithm to be used in prediction models.


iPANDA Produces Highly Robust Set of Biomarkers


One of the most important shortcomings of modern pathway analysis approaches is their inability to produce consistent results for different data sets obtained independently for the same biological case. Here we show that iPANDA algorithm applied to the tissue data overcomes this flaw and produces highly consistent set of pathway markers across the data sets used in the study. The iPANDA algorithm is an advantageous method for biologically relevant pathway marker development compared with the other pathway-based approaches.


The common marker pathway (CMP) index is applied to drug treatment response data for in order to estimate the robustness of the biomarker lists. Pathway marker lists obtained for four independent data sets were analyzed. The calculation of pathway activation scores is performed using the iPANDA algorithm and its versions with disabled gene grouping and/or topological weights. The ‘off’ state of topology coefficients means that they are equal to 1 for all genes during the calculation. Also, the ‘off’ state for the gene grouping means that all the genes are treated as individual genes. The application of the gene modules without topology-based coefficients reduces the robustness of the algorithm as well as the overall number of common pathway markers between data sets. Turning on the topology-based coefficients just slightly increases the robustness of the algorithm. Whereas using topology and gene modules simultaneously dramatically improves this parameter for both tissue types. This result implies that the combined implementation of the gene modules along with the topology-based coefficients serves as an effective way of noise reduction in gene expression data and allows one to obtain stable pathway activation scores for a set of independent data.


PANDA biomarkers as classifiers for prediction models. High AUC values for the pathway markers shown in suggest that iPANDA scores may be efficiently used as classifiers for biological condition prediction challenges.


In order to classify the samples as responders or non-responders, the random forest models were developed using iPANDA scores obtained for training sets of samples for each end point. Subsequently, performance of these models is measured using validation sets. Matthew's Correlation Coefficients (MCC), specificity and sensitivity metrics were applied to evaluate performance of the models. MCC metrics were chosen for the ease to calculate and due to their informativeness even when the distribution of the two classes is highly skewed. The similar random forest models were built using pathway activation (enrichment) scores obtained by other pathway analysis algorithms, including SPIA, GSEA, DART, ssGSEA and PLAGE. Moreover, to fully assess the performance of iPANDA-based paclitaxel sensitivity prediction models, we have trained the similar random forest models on four different gene expression subsets: expression levels of all genes (log GE), fold change for all genes between the training set and corresponding normals (log FC), expression levels of most differentially expressed genes (t-test P<0.05) (log DGE), and fold change in expression levels of most differentially expressed genes (t-test P<0.05) between the training and corresponding normal breast tissue data sets (log DFC). Logarithmic scale is used for training the gene level models. All pathway-level and gene-level data is Z-score normalized separately for each GEO data set used.


Application of the pathway activation measurement implemented in iPANDA leads to significant noise reduction in the input data and hence enhances the ability to produce highly consistent sets of biologically relevant biomarkers acquired on multiple transcriptomic data sets. Another advantage of the approach presented is the high speed of the computation. The gene grouping and topological weights are the most demanding parts of the algorithm from the perspective of computational resources. Luckily, these steps can be precalculated only once before the actual calculations using transcriptomic data. The calculation time for a single sample processing equals B1.4 s on the Intel® Core i3-3217U 1.8 GHz CPU (compared with 10 min for SPIA, 4 min for DART, about 10 s for ssGSEA, GSEA and PLAGE). Thus, iPANDA can be an efficient tool for high-throughput biomarker screening of large transcriptomic data sets.


The use of merely microarray data for pathway activation analysis has well-known limitations, as it cannot address individual variations in the gene sequence and consequently in the activity of its product. For example, a gene can have a mutation that reduces activity of its product but elevates its expression level through a negative feedback loop. Thus, the elevated expression of the gene does not necessarily correspond with the increase in the activity of its product.


Although the iPANDA algorithm is initially designed for microarray data analysis, it can also be easily applied to the data derived from genome-wide association studies (GWAS). In order to do so, GWAS data can be converted to a form amenable for the iPANDA algorithm. Single-point mutations are assigned to the genes based on their proximity to the reading frames. Then each single-point mutation is given a weight derived from a GWAS data statistical analysis40. Simultaneous use of the GWAS data along with microarray data may improve the predictions made by the iPANDA method.


One of the rapidly emerging areas in biomedical data analysis is deep learning. Recently several successful studies on microarray data analysis using various deep learning approaches on gene-level data have surfaced. Using pathway activation scores may be an efficient way to reduce dimensionality of transcriptomic data for drug discovery applications while maintaining biological relevant features. From an experimental point of view, gene regulatory networks are controlled via activation or inhibition of a specific set of signaling pathways. Thus, using the iPANDA signaling pathway activation scores as input for deep learning methods could bring results closer to experimental settings and make them more interpretable to bench biologists. One of the most difficult steps of multilayer perceptron training is the dimension reduction and feature selection procedures, which aim to generate the appropriate input for further learning. Signaling pathway activation scoring using iPANDA will likely help reduce the dimensionality of expression data without losing biological relevance and may be used as an input to deep learning methods especially for drug discovery applications. Using iPANDA values as an input data is particularly useful for obtaining reproducible results when analyzing transcriptomic data from multiple sources.


The gene expression data from different data sets is preprocessed using GCRMA algorithm45 and summarized using updated chip definition files from Brainarray repository (Version 18) for each data set independently.


Taken together, iPANDA demonstrates better performance for the noise reduction test in comparison to other pathway analysis approaches, suggesting its credibility as a powerful tool for noise reduction in transcriptomic data analysis. iPANDA ha strong ability to identify potential biomarkers (or pathway markers) of the phenotype under investigation. One of the commonly used methods to assess the capability of transcriptomic pathway markers to distinguish between two groups of samples (for example, resistance and sensitivity to treatment) is to measure their receiver operating characteristics area under curve (AUC) values. The capacity to generate a high number of biomarkers with high AUC values is a major requirement for any prospective transcriptomic data analysis algorithm to be used in prediction models.


There are several widely used collections of signaling pathways including Kyoto Encyclopedia of Genes and Genomes (KEGG), QIAGEN and NCI Pathway Interaction Database. In this study, the collection of signaling pathways most strongly associated with various types of malignant transformation in human cells were used, obtained from the SABiosciences collection (sabiosciences.com/pathwaycentral). Using a senescence-specific pathway database can be used to ensure the presence of multiple pathway markers for the particular condition under investigation. Each pathway contains an explicitly defined topology represented as a directed graph. Each node corresponds to a gene or a set of genes while edges describe biochemical interactions between genes in nodes and/or their products. All interactions are classified as activation or inhibition of downstream nodes. The pathway size ranges from about twenty to over six hundred genes in a single pathway.


The iPANDA approach for large-scale transcriptomic data analysis accounts for the gene grouping into modules based on the precalculated gene coexpression data. Each gene module represents a set of genes which experience significant coordination in their expression levels and/or are regulated by the same expression factors. Therefore the actual function for the calculation of the pathway p activation according to the proposed iPANDA algorithm consists of two terms. While the first one corresponds to the contribution of the individual genes, which are not members of any module, the second one takes into account the contribution of the gene modules. Therefore the final function for obtaining a iPANDA value for the activation of pathway p, which consists of the individual genes i and gene modules j, has the following analytical form:







iPANDA
p

=




i







G

i





p



+



i







M

i





p








The contribution of the individual genes (Gip) and the gene modules (Mjp) is 15 computed as follows:







G

i





p


=



w
i
s

·

w

i





p

T

·

A

i





p


·
lg







(

f






c
i


)









M

j





p


=



max


(

w
i
s

)


·

1
N







N

i



(


w

i





p

T

·

A

i





p


·

lg


(

f






c
i


)



)







Here fci is the fold change of the expression level for the gene i in the sample 20 under study to the normal level (average in a control group). As the expression levels are assumed to be logarithmically normally distributed and in order to convert the product over fold change values to sum, logarithmic fold changes are utilized in the final equation. Activation sign Aip is a discrete coefficient showing the direction in which the particular gene affects the pathway given. It equals +1 if the product of the 25 gene i has a positive contribution to the pathway activation and −1 if it has a negative contribution. The factors wiS and wipT are the statistical and topological weights of the







iPANDA
p

=





i







G

i





p



+



i








M

i





p








G

i





p





=


w
i
s

·

w

i





p

T

·

A

i





p


·

lg


(

f






c
i


)











M

j





p


=



max


(

w
i
s

)


·

1
N







N

i



(


w

i





p

T

·

A

i





p


·

lg


(

f






c
i


)



)







with gene i ranging from 0 to 1. The derivation procedure for these factors is described in detail in the subsequent sections. Since lg(fci) and Aip values can be positive or negative, the iPANDA values for the pathways can also have different signs. Thus positive or negative iPANDA values correspond to pathway activation or inhibition respectively.


Obtaining Gene Importance Factors


In order to estimate the topological weight (wipT), all possible walks through the gene network are calculated on the directed graph associated with the pathway map. The nodes of the graph represent genes or gene modules, while the edges correspond to biochemical interactions. The nodes which have zero incoming edges are chosen as the starting points of the walks and those which have zero outgoing edges are chosen as the final points. Loops are forbidden during walks computation. The number of walks Nip through the pathway p which include gene i is calculated for each gene. Then wipT is obtained as the ratio of Nip to the maximum value of Njp over all genes in the pathway:







w

i





p

T

=


N
ip


max


(

N
jp

)







The statistical weight depends on the p-values which are calculated from group t-test for case and normal sets of samples for each gene. The method called p−20 value thresholding is commonly used to filter out spurious genes which demonstrate no significant differences between sets. However, a major issue with the use of sharp threshold functions is that it can introduce an instability in filtered genes and as a consequence in pathway activation scores between the data sets. Additionally, the pathway activation values become sensitive to an arbitrary choice of the cutoff value. In order to address this issue, using a smooth threshold function is suggested. In the present study, the cosine function on logarithmic scale is utilized:







w
i
s

=

{





0
,

p
>

p
max










(


cos


(

π




log





p

-

log






p
min





log






p
max


-

log






p
min





)


+
1

)



/


2

,


p
min

<
p







1
,

p


p
max







I






where pmin and pmax are the high and low threshold values. In this study p-value thresholds equal to 10−7 and 10−1 respectively. For the threshold values given over 58% of all genes pass high threshold and about 12% also pass low threshold for the data under investigation. Hence over 45% of the genes in the data set receive intermediate wiS values. Therefore, more stable results for pathway activation scores between data sets can be achieved using this approach.


Grouping Genes into Modules


To obtain the gene modules, two independent sources of data were utilized: 10 human database of coexpressed genes COEXPRESdb18 and the database of the downstream genes controlled by human sequence-specific transcription factors19. The latter is simply intersected with the genes from the pathway database used, while correlation data from COEXPRESdb is clustered using Euclidean distance matrix.


Distances were obtained according to the following equation:







r
ij

=

1
-

corr
ij






where corri,j is correlation between expression levels of genes i and j. DBScan and hierarchical clustering with an average linkage criteria were utilized to identify clusters. Only clusters with an average internal pairwise correlation higher than 0.3 were considered. Clusters obtained from the transcription factors database and coexpression database were recursively merged to remove duplicates. A pair of clusters is combined into one during the merging procedure if the intersection level between clusters had been higher than 0.7. As a result, a set of 169 gene modules which includes a total of 1021 unique genes is constructed.


Statistical Credibility of the iPANDA Values


The p-values for the iPANDA pathway activation scores are obtained using weighted Fisher's combined probability test.


Algorithm Robustness Estimation


In order to quantitatively estimate the robustness of the algorithm between data sets, the Common Marker Pathway (CMP) index is introduced. The CMP 15 index is a function of the number of pathways considered as markers that are common between data sets. It also depends on the quality of the treatment response prediction when these pathways are used as classifiers. The CMP index is defined as follows:






CMP
=


1
n






j
=
1

n









i




ln


(

N
i

)


×

(


AUC
ij

-

AUC
R


)









where n is the number of data sets under study, Ni is the number of genes in the pathway i and AUCij is the value of the ROC area under curve which shows the quality of the separation between responders and non-responders to treatment when pathway i is used as classifier for the j-th data set. AUCR is the AUC value for a random classifier and equals to 0.5. A pathway is considered as a marker if its AUC value is higher than 0.8. The ln(Ni) term is included to increase the contribution of the larger pathways because they have a smaller probability to randomly get a high AUC value. The higher values of the CMP index correspond to the most robust prediction of pathway markers across the data sets under investigation, while zero value of CMP index corresponds to the empty intersection of the pathway marker lists obtained for the different data sets.


Clustering of Data Samples


In order to apply iPANDA to the Paclitaxel treatment response prediction over a several independent data sets, the pathway activation values were normalized to the Z-scores independently for each data set. The expected values used for the Z-scoring procedure were adjusted to the number of responders and non-responders in the data set under study. The pairwise distance matrix between samples utilized for further clustering is obtained using the







D
ij

=



1
N

·




N

p




(


iPANDA
ip

-

iPANDA
jp


)

2








Here Dij is the distance between samples i and j, N is the number of the pathway markers used for the distance calculation. iPANDAip and iPANDAip are the normalized iPANDA values for the pathway p for the samples i and j respectively. Normalization of iPANDA values to the Z-scores implies that all the considered pathway markers have an equal contribution to the distance obtained. All distances were converted into similarities (1−Dij) before the clustering procedure. Hierarchical clustering using Ward linkage is performed on the distance matrix to divide the samples into groups.


Transcriptome (Gene Expression) Difference


In a preferred embodiment, two iPANDA transcriptome signatures, one from a senescent patient tissue or organ to be treated (or similar proxy profile) and another representing a target, nonsenescent tissue or organ, are compared to observe transcriptome (gene expression) differences. Principal component analysis is typically applied. Gene expression trees, difference matrices matrix may also be use, as is known in the art, for example using techniques know in the art. In a preferred embodiment, a difference matrix provides the vector inputs for a machine learning architecture as described below. While iPANDA has been described with transcriptomic data, proteomic data can be used in the same protocols.


In a preferred embodiment, gene expression patterns are subjected to Principal Component Analysis (PCA). In an embodiment wherein many different tissue samples are taken, rather than just two, several clusters are formed, suggesting related biological functions for these clusters. For example, the gastrointestinal tissues, esophagus, rectum and colon all grouped together, and hematopoietic tissues (bone marrow and spleen) and monocytes also clustered. Because transcriptomes of functionally related cell types often exhibit substantial hierarchical structure a neighbor-joining gene expression tree can be generated based on mean gene expression levels. Similar to the PCA results, bone marrow and spleen clustered with monocytes, while skeletal muscle and heart muscle grouped together and were distinct from smooth muscle. Thus, for any given cell type, e.g., a neuron, epigenetic marks reflect both the prior (e.g., state in the germ layer and derived cell lineages) and present regulatory landscapes.


Differential Gene Expression of Cells and Tissues


In heart and skeletal muscle, 455 out of 12,044 genes are differentially expressed (phylogenetic analysis of variance (ANOVA) P value≤0.01) compared with other cells and tissues. Approximately 44% of these genes were associated with the tricarboxylic acid (TCA) cycle and respiration, in agreement with the metabolic organization and energy sources of these tissues.


Neurons, which are critical for cognitive and motor functions, have cell lifespans that likely exceed the lifespan of the organism. Comparing neurons to shorter-lived cells and tissues is conceptually similar to comparing gene expression of long-lived mammals to related short-lived species, e.g., examining African mole rats against other rodents.15 Accordingly, neurons should possess a gene expression signature associated with low turnover/long lifespan, in addition to the patterns indicative of neuronal function. Out of 12,044 genes 1,438 were differentially expressed in neurons (P≤0.01) and gene set enrichment analysis showed enrichment for functions associated with lysosomes, proteasomes, ribosomal proteins and apoptosis. Neurons presented with reduced expression of 27 ribosomal proteins and multiple 20S proteasome subunit genes, consistent with distinct protein metabolism required to fine-tune self-renewal and synaptic plasticity. This group of genes was not correlated with cell and tissue turnover, suggesting that this expression pattern is unique to long-lived neurons. Reduced protein metabolism, which may be induced by dietary restriction and other interventions, is known to associate with extended lifespan in a number of model organisms. Furthermore, expression of the tumor suppressor p53 (TP53) was significantly reduced (P≤0.001) in neurons, where it was expressed at a level gene expression pattern of cell and tissue turnover.


Inputs to Machine Learning Platform and iPANDA


In a preferred embodiment, a general design of the computational procedures that outputs drug classification of the invention is in four sequential steps: 1) transcriptomic similarity search, 2) protein target based search, 3) structural similarity based search, 4) transcriptomic signature screening and 5) deep neural network based search.


Regarding (1) In silico Pathway Activation Network Decomposition Analysis (iPANDA), can be applied to transcriptomic tissue-specific aging datasets obtained from Gene Expression Omnibus (GEO) with total number of samples not less than 250 for each tissue. Tissue-specific cellular senescence pathway marker sets are identified. Only pathways considerably perturbed in senescent cells (pathways with iPANDA-generated p-values less than 0.05 are considered as pathway markers). iPANDA scores are precalculated for Broad Institute LINCS Project data and were utilized for calculating transcriptomic compound similarity. Euclidian or other similarity between vectors of iPANDA scores for senolytics and other compounds of interest are calculated using data on cell lines for corresponding tissue. Only previously identified tissue-specific pathway markers were used for similarity calculation.


Regarding 2) Using LINCS Project data on knockdown cell lines the same procedure is performed to identify key target genes involved in the action of previously identified senolytic compounds D (Dasantinib), N (Navitoclax) and Q (Quercetin). The list of target genes is enriched by proteins likely to interact with these compounds using STITCH human drug-target interaction database. Pharmacophore-based search and publicly available docking algorithms are applied to identify the compounds which specifically bind the identified targets with highest affinity.


3) Structural similarity search is performed for three compounds already known to have senolytic properties (D,N,Q). Using publicly available molecular docking algorithms the importance weights for chemical groups were defined. This information is utilized for QSAR-based structure generation and filtering. Compounds from pubchem database can also be screened during the similar procedure in order to find structural analogues of D,N and Q.


4) To investigate potential effects of natural compounds without known molecular targets GEO and LINCS Project gene expression data are used. In both databases, datasets can be examined, consisting of transcriptomes of cell lines before and after treatment with multiple different chemical compounds. For aging datasets scoring exactly the same GEO datasets GSE66236, GSE69391, GSE18876, GSE21779, GSE38718, GSE59980, GSE52699, GSE48662 are used. It can be assumed that an anti-aging compound would affect an aged transcriptome to turn it into “younger” state. Mechanistically, this reflected a fact that if a certain regulatory pathway is increased (or decreased) with aging, its end targets would increase (or decrease) expression with aging. By searching for compounds which decrease (or increase) the expression of those end targets, the drugs which target these aging-associated pathways (some of its master regulators) could be discovered.


First, differentially expressed genes associated with aging are found, as well as differentially expressed genes after drug treatment. For microarray-based transcriptome data, a limma test of differential gene expression is used. Each set of differentially expressed genes is ordered accordingly to the following measure which takes into account both magnitude and statistical significance of the effect: FC max(0, −log(pvalue)), where PC is fold-change of gene expression between groups and pvalue represents the result of limma test.


A statistically motivated score estimating anti-aging abilities of a compound is designed. A significantly up- or down-regulated gene were defined as the ones with FDR<0.01 (after multiple-testing correction). A Fisher exact test is performed which measured the association of two characteristics of each gene: being significantly downregulated after the drug treatment and being significantly upregulated during aging. Vice versa, the same test is performed for significantly upregulated genes after the drug treatment versus significantly downregulated genes during aging. The best of p-values of those two tests are taken as a score for the given drug against aging. A multiple testing correction of the obtained p-values for the amount of compound under study can be performed. The same methodology is applied for screening natural compounds within LINCS transcriptomic database that are similar to the effects of other drugs, such as metformin.


5) The deep neural network-based classifier of compound pharmacological class can be trained on many compounds. Training data included structural data (QSAR, SMILES), transcriptomic response LINCS Project data on gene-level and pathway level (iPANDA) and drug-target interaction network from STITCH database. The specific class of prospective senolytic compounds is declared during training. This class included compounds identified on the steps 1,2,3 of the study.


Established classifier accuracy is recorded after the class-balancing of the test 10 set. A list of senolytic compounds after scanning the database of 300000+ compounds is obtained for further analysis. Top ranking compounds are obtained on each of the steps and intersection is found for each tissue independently. As a result, compounds are identified as having the best senolytic properties for the tissue. A set of structural analogues according to the procedure in step 3 is obtained, which possess similar molecular properties, and likely senolytic properties.


6) Finding structural analogs of desired molecules. An aim also is to find structural analogs of molecule of interest for protein-ligand interaction. This approach is highly efficient for increasing the specificity of binding with targets (proteins).


At the first step we provide an analysis of possible targets for the drug compounds. This can be done in two ways: 1) using specific programs for searching in databases for different interactions of molecules of interest with proteins/genes (e.g. STITCH); 2) article analysis of an experimental data. In the case of a molecule chosen the second way as it helps to select the best variants of experimentally approved protein-ligand interactions. From literature analysis n targets are chosen according to parameters: 1) specific binding of target with drug(s); 2) the lowest IC50; 3) the presence of the structure in protein data bank.


After that for all of the structures we applied docking for all possible active sites and additional pockets of binding. The best positions of drugs in target were chosen and after an additional docking is done with the usage of algorithm of flexible chains.


Then all the structures of the target were analyzed according to algorithm: 1) amount of hydrogen bonds 2) hydrophobic/hydrophilic interactions 3) number n-n interactions. This information was used further to understand the key principles by which molecule can bind into the specific site of the target. According to such analysis one can find the rules for a molecule to be modified in for better binding properties with specific target. With the usage of the software the analogs are found according to the rule for the molecule. After that toxicology in silico test are provided with choosing non-toxic analogs. These new non-toxic analogs were again docked into the binding site of the target for interactions analysis and those which showed the best score results are selected as most promising and perspective ones. Other structural analogs and conformers can be extracted from the Pubchem Database.


In a preferred embodiment, a deep neural network, similar to that described in, for example, Aliper et. al., “Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data”, Mol Pharm, 2016 July 5; 13(7): 2524-2530, and Mamoshina et. al., “Applications of Deep Learning in Biomedicine”, Mol Pharm, 2016 Mar. 13(5), is used, in combination with a cellular signature database such as the LINCS database and a drug therapeutic use database such as MeSH, as inputs to the DNN in order to output drug classifications to develop a therapeutic protocol, in this case to categorize and choose drugs for a senescence or other treatment protocol. LINCS is the US Library of Network-Based Cellular Signatures Program aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents. MeSH is (Medical Subject Headings) is the US National Library of Medicine controlled vocabulary thesaurus used for indexing articles for PubMed, the free search engine of references and abstracts on life sciences and biomedical topics also from the US National Library of Medicine.


An adversarial autoencoder (AAE) works by matching the aggregated posterior to the prior ensures that generating from any part of prior space results in meaningful samples. As a result, the decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution. An AAE can be used in applications such as semi-supervised classification, disentangling style and content of images, unsupervised clustering, dimensionality reduction and data visualization. AAEs are used, for example, in generative modeling and semi-supervised classification tasks. Thus an AAE turns an autoencoder into a generative model. The AAE is often trained with dual objectives—a traditional reconstruction error criterion, and an adversarial training criterion that matches the aggregated posterior distribution of the latent representation of the autoencoder to an arbitrary prior distribution.


In a preferred embodiment derived from Kadurin, the method uses a 7-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output the AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer we also introduced a neuron responsible for growth inhibition percentage, which when negative indicates the reduction in the number of tumor cells after the treatment. To train the AAE one uses a cell line assay data for compounds profiled in a cell line. The output of the AAE can then be used to screen drug compounds, such as the 72 million compounds in PubChem, and then select candidate molecules with potential anti-sensecent or properties.


The latest class of non-parametric approaches for deep generative models is known as generative adversarial network (GAN). In this new framework, initially proposed by Goodfellow, generative models are estimated via an adversarial process. In practice, two models are simultaneously trained: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making an error. Thus, this framework does not correspond to the standard optimization problem as it is based on a value function that one model seeks to maximize and the other seeks to minimize. The process terminates at a saddle point that is a minimum with respect to one model's strategy and a maximum with respect to the other model's strategy. Because GANs do not require an explicit representation of the likelihood, neither approximate inference nor Markov chains are necessary. Consequently, GANs provide an attractive alternative to maximum likelihood techniques.


Generative capabilities of deep adversarial network techniques open the doors to new perspectives as it could contribute to overcome several limitations of current data driven computational methods. For example, we can apply GANs on transcriptomics data for the generation of new samples for a desired phenotypic groups and in chemoinformatics for the prediction of the physical, chemical, or biological properties and structures of molecules. Quantitative structure-activity relationships (QSAR) and quantitative structure-property relationships (QSPR) are still considered as the modern standard for predicting properties of novel molecules. To that end, many ML-based approaches have been developed to tackle such problems, but recent results show that the DL-based methods match or outperform other state-of-the-art methods and demonstrate better predictive performance, parsimony and interpretability and web-based predictors are available on some cases. Furthermore, new methods based on convolutional neural networks are able to perform predictions by directly using graphs of arbitrary size and shape as inputs rather than fixed feature vectors and one can expect to see the development of more flexible deep generative architectures that can be applied directly to other structured data such as sequences, trees, graphs, and 3D structures. Thus, the deep adversarial network techniques could be used to improve accuracy, generative capabilities and predictive power and address several issues including computational cost, limited computation at each layer and limited information propagation across the graph.


Target prediction and mapping of bioactive small compounds and molecules by analyzing binding affinities and chemical properties is another area of research that makes extensive use of data-driven computational methods in order to optimize the use of data available in existing repositories. Despite promising results and the availability of web-platforms to computationally identify new targets for uncharacterized molecules or secondary targets for known molecules such as SwissTargetPrediction, in general, the available methods remain too inaccurate for systematic binding predictions and physical experiments remain the state of the art for binding determination. In this field, DL-based methods, such as the recently released methods AtomNet based on deep convolutional neural networks have allowed to circumvent several limitations and outperform more traditional computational methods including RFs, SVMs for QSAR and ligand-based virtual screening. One can expect that the development of DL-methods making use of the GAN framework will also lead to significant improvement with respect to prediction accuracy and power.


In a preferred embodiment, the adversarial network and the autoencoder are trained jointly with SGD in two phases—the reconstruction phase and the regularization phase—executed on each mini-batch. In the reconstruction phase, the autoencoder updates the encoder and the decoder to minimize the reconstruction error of the inputs. In the regularization phase, the adversarial network first updates its discriminative network to tell apart the true samples (generated using the prior) from the generated samples (the hidden codes computed by the autoencoder). The adversarial network then updates its generator (which is also the encoder of the autoencoder) to confuse the discriminative network. Once the training procedure is done, the decoder of the autoencoder will define a generative model that maps the imposed prior of p(z) to the data distribution.


In a preferred embodiment, the input layer is divided into a fingerprint part and a concentration input neuron. In a preferred embodiment, an AAE is trained to encode and reconstruct not only molecular fingerprints, but also experimental concentrations. The Encoder consists of two consequent layers L1 and L2 with 128 and 64 neurons, respectively. The decoder consists of the two layers L′1 and L′2, comprising 64 and 128 neurons respectively. The latent layer consists of 5 neurons, one of which is the GI and the four others are discriminated with normal distribution. Since we train an encoder net to predict ‘efficiency’ against ‘senescence’ in a single neuron of latent layer, we divide the latent vector in two parts—‘GI’ and ‘representation’. So we added a regression term to the encoder cost function. Furthermore, we restrict our encoder to map the same fingerprint to the same latent vector independently from input concentration by additional ‘manifold’ cost. Here we compute mean and variance of the concentrations through all dataset and then use them to sample concentrations for ‘manifold’ step. On each step we sample fingerprint from the training set and batch of concentration from normal distribution with given mean and variance. The training net with ‘manifold’ loss is performed by maximization of cosine similarity between ‘representations’ of similar fingerprints with different concentrations


All these changes resulted in a 5-step train iteration instead of a 3-step in AAE basic model: (a) Discriminator trained to distinguish between given latent distribution and encoded ‘representation’; (b) Encoder trained to confuse Discriminator with generated ‘representations’; (c) Encoder and Decoder trained jointly as Autoencoder; (d) Encoder trained to fit ‘score’ part of latent vector; (e) Encoder trained with ‘manifold’ cost.


The two first steps (a,b) are trained as usual adversarial networks. The Autoencoder cost function is computed as a sum of logloss of fingerprint part and mean squared error (MSE) of concentration parts and MSE is also used as a regression cost function. Example code for a preferred AAE is available at github.com/spoilt333/onco-aae.


Experimental/Simulations/Models


1. Single Biopsy (or Existing Individual Profile).


Single biopsy test of liver or lung is taken from the patient according to standard procedures in medical center as described in in the nhlbi.hih.gov website. For a lung biopsy, few samples of lung tissue from several places in lungs will be taken. The samples are examined under a microscope, transcriptome and gene expression profiles and/or proteome and protein production profiles are also analyzed. This procedure can help rule out other conditions, such as sarcoidosis, cancer, or infection. Lung biopsy also can show how far disease has advanced.


There are several procedures to get lung tissue samples.


Video-assisted thoracoscopy. This is the most common procedure used to get lung tissue samples. An endoscope is inserted with an attached light and camera into chest through small cuts between ribs. The endoscope provides a video image of the lungs and allows to collect tissue samples. This procedure must be done in a hospital.


Bronchoscopy. For a bronchoscopy, a thin, flexible tube through is passed in nose or mouth, down a throat, and into airways. At the tube's tip are a light and mini-camera. They allow to see windpipe and airways. Then a forceps is inserted through the tube to collect tissue samples.


Bronchoalveolar lavage. During bronchoscopy, a small amount of saltwater (saline) is injected through the tube into lungs. This fluid washes the lungs and helps bring up cells from the area around the air sacs. These cells are examined under a microscope.


Thoracotomy. For this procedure, a few small pieces of lung tissue are removed through a cut in the chest wall between ribs. Thoracotomy is done in a hospital.


For a liver biopsy, few samples of liver tissue from several places in liver will be taken. The samples are examined under a microscope, transcriptome and gene expression profiles are also analyzed.


There are several procedures to get live tissue samples.


Percutaneous Liver Biopsy. The health care provider either taps on the abdomen to locate the liver or uses one of the following imaging techniques: ultrasound or computerized tomography (CT) and will take samples with the needle.


Transvenous Liver Biopsy. When a person's blood clots slowly or the person has ascites—a buildup of fluid in the abdomen—the health care provider may perform a transvenous liver biopsy. A health care provider applies local anesthetic to one side of the neck and makes a small incision there, injects contrast medium into the sheath and take an x ray. After this insert and remove the biopsy needle several times if multiple samples are needed.


Laparoscopic Liver Biopsy. Health care providers use this type of biopsy to obtain a tissue sample from a specific area or from multiple areas of the liver, or when the risk of spreading cancer or infection exists. A health care provider may take a liver tissue sample during laparoscopic surgery performed for other reasons, including liver surgery.


2. Pathway Signature Measurement


Transcriptomic Data:


From the GEO database (ncbi.nlm.nih.gov/geo/) data sets containing gene expression data related to IPF patients and normal healthy lung tissue used as a reference were downloaded (21 data sets). IPF and normal data from different data sets was preprocessed using GCRMA algorithm and summarized using updated chip definition files from Brainarray repository for each data set independently.


Differential genes were calculated using limma and deseq2 algorithms for groups of comparison: IPF (IPF vs reference healthy lung tissue); Senescence (old vs reference young healthy lung tissue); Smoking (current smoker vs reference non-smoker); Age status data was available for 2 data sets and smoking status data was available for 1 data set.


Differential expression genes data was used as an input for iPANDA algorithm in order to measure the pathway signature of each comparison group. Alternately, proteomic data may be used.


Pathway Database Overview:


There are several widely used collections of signaling pathways including Kyoto Encyclopedia of Genes and Genomes, QIAGEN and NCI Pathway Interaction Database. In this study, we use the collection of signaling pathways most strongly associated with various types of malignant transformation in human cells obtained from the SABiosciences collection (sabiosciences.com/pathwaycentral).


3. Compare Signature Profiles.


Signature profile for each comparison group can be constructed based on iPANDA p-values cut-off (p-value<=0.05) and common overlap among different data sets: intersection cut-off threshold equal to 15 was used for IPF data, 2 for senescence data and 1 for smoking data.


4. Personalize the Treatment.


DNNs can be used as a tool to predict active compounds and generate a compounds with a desired efficacy. The application of DNN-based models can be used for personalization of compounds for individual patients and evaluation of the treatment efficacy and safety.


Machine learning approaches provide the tools of the analysis of biomedical data without prior assumption on the functional relations of this data. And Deep Neural Network (DNN) based approaches, such as multi-layered feed forward neural networks, are able to fit the complex and sparse biomedical data and learn highly non-linear dependencies of the raw data without the modification of features of interest. And deep learning is a state of the art method for many task from machine vision to language translation. But despite the fact, that biomedicine entered the era of “big data”, biomedical datasets are usually limited by sample sizes. And feature selection and dimensionality reduction of the feature space usually increase the predictive power of the DNNs applied in the biomedical domain (Aliper, Plis, et al. 2016).


A system can be provided that utilizes quantitative models with a deep architecture that is able to stratify compounds by their efficacy for the individual patient based his or her personal profile. In part, the personal profile can include the biological pathways analyzed with the quantitative models. The following data could be used as input feature to the system: gene expression profiles and signaling pathway profiles, blood tests (Putin et al. 2016), protein expression profiles, clinical history as well as a deep representation of the electronic health record (Miotto et al. 2016).


A system can be provided that utilizes the quantitative models with a deep architecture that is able to evaluate the efficacy of the proposed treatment through the quantitative assessment of the health status of the patient, such a biological age, life expectancy, the probability of survival. The following data could be used as input feature to the system: gene expression profiles and signaling pathway profiles, blood tests, protein expression profiles, clinical history as well as a deep representation of the electronic health record.


A system can be provided that utilizes the quantitative models with a deep architecture that is able to predict potential side effect of the treatment. The following data could be used as input feature to the system: gene expression profiles and signaling pathway profiles, blood tests, protein expression profiles, clinical history as well as a deep representation of the electronic health record.


A system can be provided based on generative model with deep architecture (Kadurin et al. 2017) that is able to generate molecules with a desired properties, such as high efficacy, low toxicity, high bioavailability and the like. Generated molecules can be evaluated by the DNN based systems through the efficacy and safety prediction.


Accordingly, a 5R strategy as described herein can be applied to patients with pre-senescent, senescent and fibrotic conditions. 5R strategy includes: Rescue; Remove; Replenish; reinforce; and Repeat


Stage 1. Rescue.


The first step of 5R strategy is rescuing pre-senescent cells in a particular tissue (including liver and lungs). Pre-senescent phenotype is considered potentially reversible. In order to rescue the cells demonstrating pre-senescent phenotype the specific set of possible interventions shall be applied. These interventions include the treatment with the one senoremediator compound or a combination of the senoremediator compounds from the list herein. Senoremediator compounds should be administered orally, by injection, sublingually, buccally, rectally, vaginally, cutaneously, transdermally, occularly, oticly or nasally or any other way.


Stage 2. Remove.


This step is performed to eliminate the cells that already entered the irreversible senescent state. Senescent cells lose their function and possess a constant danger to the surrounding cells as described above. Elimination of such cells may prevent surrounding cells to enter the senescent phenotype by positive loop and restore the normal tissue functioning. In order to eliminate the cells demonstrating senescent phenotype the specific set of possible interventions shall be applied. These interventions include the treatment with the one senolytic compound or a combination of the senolytic compounds from the list below. Senolytic compounds should be administered orally, by injection, sublingually, buccally, rectally, vaginally, cutaneously, transdermally, occularly, oticly or nasally or any other way.


Stage 3. Replenish.


The second step leads to the general rejuvenation of the cells in the population, but on the other hand, to the reduction in the total cell count. This allows for the further replenish step to be used for repopulation of the tissue with functional cells. Therefore, the pool of stem/progenitor cells in a particular tissue (including mesenchymal and epithelial stem cells in lungs, liver) should be activated in order to replenish the tissue. The possible interventions needed to achieve that goal include the treatment with the one specific compound or a combination of the compounds from the list below. Importantly the compounds should stimulate the proliferation of the stem cells, but on the other hand prevent the unwanted effects related to the possible uncontrolled proliferation and subsequent malignant transformation. The compounds should be administered orally, by injection, sublingually, buccally, rectally, vaginally, cutaneously, transdermally, ocularly, oticly or nasally or other method.


Stage 4. Reinforce.


This step is used to prevent the further potential degradation of the tissue (or organ). It may include the treatment with the one specific compound or a combination of the compounds from the list below. These compounds should demonstrate one of the following activities: immunomodulation in order to prevent possible malignant transformation and the accumulation of the senescent cells, cytoprotection in order to retain the functional state of the tissue, stimulation of the macrophages in order to achieve the specific state of senophagy (ability to specifically engulf and digest senescent cells). The compounds should be administered orally, by injection, sublingually, buccally, rectally, vaginally, cutaneously, transdermally, ocularly, oticly or nasally or other method.


Stage 5. Repeat.


The whole multi-stage longevity therapeutics pipeline (stages 1-4) can be applied recurrently. The period between the therapies is defined individually on the tissue (organ)-specific basis and may vary from 1 month to 10 years.


In an embodiment, the first four steps Rescue; Remove; Replenish; Reinforce can be used as a multi-stage longevity therapeutics pipeline and can be applied more than once, and on an ongoing basis. The period between the therapies is defined individually on a tissue, organ, and patient specific basis and may vary from 1 month to 10 years between treatments, or may essentially be continually ongoing, for some or all of the steps.


EXAMPLES

The invention includes methods, system, drugs, apparatus, computer program product, among others, to carry out the following.



FIG. 3 illustrates a transcriptomic clock method for accuracy of biological aging assessment, compatible with the current invention. The correlation between actual chronological age (x-axis) with predicted age (y-axis) for healthy individuals using the validation set. The grey line represents the linear regression decision boundary line. Values for r, R2 and p-value are provided at the top of the figure. Note that the term Disease0 in this and other figures simply means healthy/control subjects were used for such biological aging assessment.



FIG. 4 illustrates the performance of age predicting models (A) Actual chronological age vs. predicted age for Deep Feature Selection Model (DFS) on validation and testing sets. The grey line represents the linear regression decision boundary line.


Values for R2 and MAE are provided at the bottom of the figure.



FIG. 5 illustrates the performance of age predicting model trained on the microarray data on the external validation set of RNAseq data. The correlation between actual chronological age group (x-axis) with predicted age (y-axis) for healthy individuals using the external validation set. Mean of the actual chronological age group vs. predicted age for the Deep Feature Selection Model (DFS).



FIG. 6 illustrates distribution on number of samples by age for healthy individuals in the validation set. Blue (darker) and green (lighter) values are actual chronological age and assigned biological ages, respectively. For relatively healthy people, not surprisingly, assigned biological is close to chronological age.



FIG. 7 illustrates an example epsilon-prediction accuracy for healthy individuals.


The epsilon-prediction accuracy is defined as follows:







ɛ
-
prediction

=





i
=
1

N




1
A



(

f
i

)



N





Where fi is the predicted value, 1A is an indicator function with A∈[yi−ε; yi+ε]


For example, if epsilon=0 and yi=45, the DNN correctly recognizes this sample if the prediction of the sample belongs to the interval.



FIG. 8 is a plot illustrates clustering using t-SNE clustering algorithm by age for healthy individuals. Color bar indicates the age of the sample. For this particular example, there are no clearly defined clusters of healthy individuals by age.


Example 1

Age Prediction Models as a Target Identification Tools



FIG. 9 illustrates the list of selected targets based on the importance ranking provided by the deep transcriptomic clocks and other machine learning methods. In the present study, we explore several methods to evaluate the importance of features (genes) on age prediction. Genes were ranked by four methods: differential expression analysis, linear regression with elastic regularization (ElasticNet; genes ranked by absolute values of their regression coefficients for a model), Random Forest (Gini importance value of each gene). Next, we explored the relative importance values assigned to genes by the Deep Feature Selection model, averaging the importance values of genes for the five-fold cross validation process.


In addition to feature importance ranking, we also explored the wrapper method, which we have successfully applied previously in the context of identifying the most important blood markers for age prediction (Putin et al., 2016; Mamoshina et al., 2018). We applied the same technique in the present study, with some modification. Here we explored random permutations of vectors of gene expression values along with increased (by log 2 fold changes of 3) and decreased (log 2 fold changes of −3) gene expression values.


In case of random permutations, x′i=rand(x), where x is a vector of expression of i gene.


In case of a direct increase or decrease, x′i=x×2j, where x is a vector of expression of i gene and f is a fold change of 3 and −3 respectively.


Therefore, feature importance value for the gene i is calculated as,







F






I
i


=





m
=
1

k









R
2



(

Y
,

Y
^


)




R
2



(

Y
,


Y
^




)




k





where Ŷ is a vector of predicted value of age and {circumflex over ({dot over (Y)})} is a vector predicted values of age after permutations, k is a number of cross-validation folds and, in this case, equals to 5.


We used Support Vector Machine algorithm as an age predicting model. Each model predicts age after a modification of gene expression values and assigns an importance coefficient to the gene based on the accuracy of age prediction. Afterwards, scores obtained on the validation sets are summed, and each gene-associated importance factor is averaged to yield a final value.


Borda count algorithm was applied to summarize all six ranks derived from age predicting models, and the rank of genes sorted by absolute log 2 fold change values derived from differential expression analysis, in order to obtain the final importance rank of genes.


Table A provides 49 genes that are determined to be significantly important, in a preferred embodiment, for age prediction grouped by disease and molecular function category. The corresponding proteins that are translated from the genetic material may also be used.










TABLE A





Category
List of genes in each category







Metabolism and energy
ACACB, SCD, ALDOC, SMOX,


homeostasis
AMACR, HTRA1, ARG1, HLCS,



HSD3B7, PECI


Hypertension and hypoxia
PTGDS, HPGD, NT5E, TMSB4Y,



ADORA2B, ACTN1, SNTB2.


Neuropathy
NETO2, GRM2, CACNA1I, NRCAM,



CCT5, BAIAP2, QPRT, TMEM18,



PPP1R9B,


Genomic stability
TOP1MT, PARP3, NOTCH1, TAF7,



TINF2, CHTOP, CTBP1, CBX7, RRP1,



RNF144, PNPT1, C16orf42


Smooth muscle construction
ADORA2B, SOD1


Age-related macular degeneration
HTRA1


Tumor angiogenesis
CD248, VASH1, SERTAD3, TNFSF8,



YWHAE, CRK, CBLL1, CDCA7L, E2F4


Inflammation
AKIRIN2, DEFB123, PLXNC1,



PSMD12, RELA









Table B lists of 100 gene names and abbreviations, all human, used for transcriptome clock analysis in a preferred embodiment. The corresponding proteins that are translated from the genetic material may also be used.












TABLE B





Gene Name
Ensembl gene ID
David Gene Name
Species







ACACB
ENSG00000076555
acetyl-CoA carboxylase

Homo sapiens





beta(ACACB)



ADORA2B
ENSG00000170425
adenosine A2b

Homo sapiens





receptor(ADORA2B)



AKIRIN2
ENSG00000135334
akirin 2(AKIRIN2)

Homo sapiens



AMACR
ENSG00000242110
alpha-methylacyl-CoA

Homo sapiens





racemase(AMACR)



ANKRD54
ENSG00000100124
ankyrin repeat domain

Homo sapiens





54(ANKRD54)



ARFGAP3
ENSG00000242247
ADP ribosylation factor

Homo sapiens





GTPase activating protein





3(ARFGAP3)



ARHGAP26
ENSG00000145819
Rho GTPase activating protein

Homo sapiens





26(ARHGAP26)



BAIAP2
ENSG00000175866
BAI1 associated protein

Homo sapiens





2(BAIAP2)



BET1
ENSG00000105829
Bea golgi vesicular membrane

Homo sapiens





trafficking protein(BET1)



BPNT1
ENSG00000162813
3′(2′), 5′-bisphosphate

Homo sapiens





nucleotidase 1(BPNT1)



C16orf42
ENSG00000007520
TSR3, Acp Transferase

Homo sapiens





Ribosome Maturation Factor



C17orf48
ENSG00000170222
ADP-Ribose/CDP-Alcohol

Homo sapiens





Diphosphatase, Manganese



C1orf77
ENSG00000160679
Chromatin Target Of PRMT1

Homo sapiens



C9orf91
ENSG00000157693
Transmembrane Protein 268

Homo sapiens



CACNA1I
ENSG00000100346
calcium voltage-gated channel

Homo sapiens





subunit alphal I(CACNA1I)



CBLL1
ENSG00000105879
Cbl proto-oncogene like

Homo sapiens





1 (CBLL1)



CBX7
ENSG00000100307
chromobox 7(CBX7)

Homo sapiens



CCT5
ENSG00000150753
chaperonin containing TCP1

Homo sapiens





subunit 5(CCT5)



CD248
ENSG00000174807
CD248 molecule(CD248)

Homo sapiens



CDCA7L
ENSG00000164649
cell division cycle associated 7

Homo sapiens





like(CDCA7L)



CDK6
ENSG00000105810
cyclin dependent kinase

Homo sapiens





6(CDK6)



CLDN14
ENSG00000159261
claudin 14(CLDN14)

Homo sapiens



CLIC3
ENSG00000169583
chloride intracellular channel

Homo sapiens





3 (CLIC3)



COBRA1
ENSG00000188986
Negative Elongation Factor

Homo sapiens





Complex Member B



CRK
ENSG00000167193
CRK proto-oncogene, adaptor

Homo sapiens





protein(CRK)



CTBP1
ENSG00000159692
C-terminal binding protein

Homo sapiens





1 (CTBP1)



DAPP1
ENSG00000070190
dual adaptor of

Homo sapiens





phosphotyrosine and 3-





phosphoinositides 1(DAPP1)



DBNDD2
ENSG00000244274
dysbindin domain containing

Homo sapiens





2(DBNDD2)



DEFB123
ENSG00000180424
defensin beta 123(DEFB123)

Homo sapiens



DERPC
ENSG00000168802
Chromosome Transmission

Homo sapiens





Fidelity Factor 8



DHTKD1
ENSG00000181192
dehydrogenase E1 and

Homo sapiens





transketolase domain





containing 1(DHTKD1)



E2F4
ENSG00000205250
E2F transcription factor

Homo sapiens





4(E2F4)



FANCL
ENSG00000115392
Fanconi anemia

Homo sapiens





complementation group





L(FANCL)



FLJ10374
ENSG00000105248
coiled-coil domain containing

Homo sapiens





94



FLJ43093
ENSG00000255587
RAB44, Member RAS

Homo sapiens





Oncogene Family



FZD1
ENSG00000157240
frizzled class receptor 1(FZD1)

Homo sapiens



GALNS
ENSG00000141012
galactosamine (N-acetyl)-6-

Homo sapiens





sulfatase(GALNS)



GALNT6
ENSG00000139629
polypeptide N-

Homo sapiens





acetylgalactosaminyltransferase





6(GALNT6)



GATAD2A
ENSG00000167491
GATA zinc finger domain

Homo sapiens





containing 2A(GATAD2A)



GLT1D1
ENSG00000151948
glycosyltransferase 1 domain

Homo sapiens





containing 1(GLT1D1)



GPA33
ENSG00000143167
glycoprotein A33(GPA33)

Homo sapiens



GRM2
ENSG00000164082
glutamate metabotropic

Homo sapiens





receptor 2(GRM2)



HSD3B7
ENSG00000099377
hydroxy-delta-5-steroid

Homo sapiens





dehydrogenase, 3 beta- and





steroid delta-isomerase





7(HSD3B7)



LDOC1L
ENSG00000188636
leucine zipper down-regulated

Homo sapiens





in cancer 1 like(LDOC1L)



LIPN
ENSG00000204020
lipase family member N(LIPN)

Homo sapiens



LMCD1
ENSG00000071282
LIM and cysteine rich domains

Homo sapiens





1(LMCD1)



LOC100130298
ENSG00000258130
hCG1816373-

Homo sapiens





like(LOC100130298)



LOC285908
ENSG00000179406
Long Intergenic Non-Protein

Homo sapiens





Coding RNA 174



LOC613038
ENSG00000258130
SAGA complex associated

Homo sapiens





factor 29





pseudogene(LOC613038)



LOC643905
ENSG00000221961
Proline Rich 21

Homo sapiens



LOC652784
NA
NA

Homo sapiens



LOC653884
NA
serine/arginine-rich splicing

Homo sapiens





factor 10-like



LOC729338
ENSG00000224786
Centrin 4, Pseudogene

Homo sapiens





(CETN4P)



LOC731444
NA
NA

Homo sapiens



LRP3
ENSG00000130881
LDL receptor related protein

Homo sapiens





3(LRP3)



MFNG
ENSG00000100060
MFNG O-fucosylpeptide 3-

Homo sapiens





beta-N-





acetylglucosaminyltransferase





(MFNG)



NETO2
ENSG00000171208
neuropilin and tolloid like

Homo sapiens





2(NETO2)



NRCAM
ENSG00000091129
neuronal cell adhesion

Homo sapiens





molecule(NRCAM)



NTSR2
ENSG00000169006
neurotensin receptor 2(NTSR2)

Homo sapiens



NUDT5
ENSG00000165609
nudix hydrolase 5(NUDT5)

Homo sapiens



PACSIN2
ENSG00000100266
protein kinase C and casein

Homo sapiens





kinase substrate in neurons





2(PACSIN2)



PARP3
ENSG00000041880
poly(ADP-ribose) polymerase

Homo sapiens





family member 3(PARP3)



PARP8
ENSG00000151883
poly(ADP-ribose) polymerase

Homo sapiens





family member 8(PARP8)



PECI
ENSG00000198721
Enoyl-CoA Delta Isomerase 2

Homo sapiens



PLXNC1
ENSG00000136040
plexin C1(PLXNC1)

Homo sapiens



PNPT1
ENSG00000138035
polyribonucleotide

Homo sapiens





nucleotidyltransferase





1(PNPT1)



PPP1R9B
ENSG00000108819
protein phosphatase 1

Homo sapiens





regulatory subunit





9B(PPP1R9B)



PSMD12
ENSG00000197170
proteasome 26S subunit, non-

Homo sapiens





ATPase 12(PSMD12)



QPRT
ENSG00000103485
quinolinate

Homo sapiens





phosphoribosyltransferase





(QPRT)



RAB3D
ENSG00000105514
RAB3D, member RAS

Homo sapiens





oncogene family(RAB3D)



RELA
ENSG00000173039
RELA proto-oncogene, NF-kB

Homo sapiens





subunit(RELA)



RGMB
ENSG00000174136
repulsive guidance molecule

Homo sapiens





family member b(RGMB)



RNASET2
ENSG00000026297
ribonuclease T2(RNASET2)

Homo sapiens



RNF144
ENSG00000151692
Ring Finger Protein 144A

Homo sapiens



RRP1
ENSG00000160214
ribosomal RNA processing

Homo sapiens





1(RRP1)



S100A9
ENSG00000163220
S100 calcium binding protein

Homo sapiens





A9(S100A9)



SERTAD3
ENSG00000167565
SERTA domain containing

Homo sapiens





3 (SERTAD3)



SGPL1
ENSG00000166224
sphingosine-1-phosphate lyase

Homo sapiens





1(SGPL1)



SIGLEC7
ENSG00000168995
sialic acid binding Ig like lectin

Homo sapiens





7(SIGLEC7)



SLC25A19
ENSG00000125454
solute carrier family 25

Homo sapiens





member 19(SLC25A19)



SLC38A10
ENSG00000157637
solute carrier family 38

Homo sapiens





member 10(SLC38A10)



SOD1
ENSG00000142168
superoxide dismutase 1,

Homo sapiens





soluble(SOD1)



SRPRB
ENSG00000144867
SRP receptor beta

Homo sapiens





subunit(SRPRB)



TAF7
ENSG00000178913
TATA-box binding protein

Homo sapiens





associated factor 7(TAF7)



TCTN3
ENSG00000119977
tectonic family member

Homo sapiens





3 (TCTN3)



TIGD7
ENSG00000140993
tigger transposable element

Homo sapiens





derived 7(TIGD7)



TINF2
ENSG00000092330
TERF1 interacting nuclear

Homo sapiens





factor 2(TINF2)



TMEM18
ENSG00000151353
transmembrane protein

Homo sapiens





18(TMEM18)



TMSB4Y
ENSG00000154620
thymosin beta 4, Y-

Homo sapiens





linked(TMSB4Y)



TNFSF8
ENSG00000106952
tumor necrosis factor

Homo sapiens





superfamily member





8(TNFSF8)



TRIM7
ENSG00000146054
tripartite motif containing

Homo sapiens





7(TRIM7)



TSPAN10
ENSG00000182612
tetraspanin 10(TSPAN10)

Homo sapiens



VKORC1L1
ENSG00000196715
vitamin K epoxide reductase

Homo sapiens





complex subunit 1 like





1(VKORC1L1)



VTI1B
ENSG00000100568
vesicle transport through

Homo sapiens





interaction with t-SNAREs





1B(VTI1B)



YWHAE
ENSG00000108953
tyrosine 3-

Homo sapiens





monooxygenase/tryptophan 5-





monooxygenase activation





protein epsilon(YWHAE)



ZNF259
ENSG00000109917
ZPR1 Zinc Finger

Homo sapiens



ZNF544
ENSG00000198131
zinc finger protein

Homo sapiens





544(ZNF544)



ZNF583
ENSG00000198440
zinc finger protein

Homo sapiens





583(ZNF583)



ZNF697
ENSG00000143067
zinc finger protein

Homo sapiens





697(ZNF697)



ZNF763
ENSG00000197054
zinc finger protein

Homo sapiens





763(ZNF763)










FIG. 10 is a Venn diagram showing selected gene list overlap. A four-way Venn diagram illustrates all unique, two-way, three-way and four-way sets of shared genes. Gene lists were selected using the deep transcriptomic aging clocks described herein. A set of genes that is common for all tissues could be considered as an aging-related universal targets that could be used to develop therapies.


Under the pressure of environmental factors and hereditary characteristics, the rate of aging naturally varies between individuals. As a result, biological age as defined by biomarkers often differs between individuals of the same chronological age. Biomarkers of biological aging again are the objective physiological indicators of tissues and organ conditions that are used to assess personal aging rates. Aging is of course associated with health risks, inability to maintain homeostasis and eventual death prognosis of age-related diseases.


The biomarkers of biological aging as described herein can evaluate the effectiveness of anti-aging remedies. This is of importance as populations in developed nations throughout the world are rapidly aging, and the search and identification of efficient anti-aging interventions, has never been more essential.


Because aging is a complex multifactorial process with no single cause or treatment (Zhavoronkov 2011; Trindade, 2013) that affects most if not all tissues and organs of the body, the currently available biomarkers in the art do not accurately represent the health state of the entire organism or individual systems, and do not provide accurate and useful measures of biological age. Furthermore, several of them are not easily measured. Thus, biomarkers based on not only quantifiable but also easily measurable characteristics are still required.


Usually, identifying and developing biomarkers is a multi-steps process that includes proof of concept, experimental validation and analytical performance validation. Nevertheless, alternative approaches based on in silico methods can also be used in order to improve and speed up the development and validation process of these biomarkers. The use of more effective computational approaches for the development of biomarker is favored by two technological trends. First of all, the accumulation of high-throughput data generated from different research areas such as proteomics, genomics, chemoproteomics and phenomics. The second technological trend is the progress made in computational sciences that, combined with increasingly powerful computational resources, allows the development of repurposing algorithms but also of software's for retrospective analysis as well as the maintenance of web-based databases which are required for the gathering and classification of the experimental data (Lavecchia, 2016). Using these computational resources, various techniques such as Machine Learning (ML) are routinely used in biomarker development.


Although Deep Learning (DL) methods were initially developed for dealing with task such as pattern, voice and image recognition (Oquab 2014), they can also be used to improve the efficiency of in silico techniques applied for biomarkers identification. DL-based methods are indeed able to overcome many current limitation of more traditional in silico techniques. For instance, for integrating biomedical data which are complex. The modern DL techniques include powerful approaches with deep architecture, called Deep Neural Networks (DNNs). Neural Networks are collections of neurons (also called units) connected in an acyclic graph. Neural Network models are often organized into distinct layers of neurons.


For most neural networks, the most common layer type is the fully-connected layer in which neurons between two adjacent layers are fully pairwise connected, but neurons within a single layer share no connections. One of the main features of DNN is that neurons are controlled by non-linear activation functions. This non-linearity combined with the deep architecture make possible more complex combinations of the input features leading ultimately to a wider understanding of the relationships between them and as a result to a more reliable final output. DNNs have already been applied for many types of data ranging from structural data to chemical descriptors or transcriptomics data (Mayr 2016, Wang 2014, Ma 2015). Because of this flexibility and adaptability of DNN for learning from large range of data, DNNs are now considered as an interesting computational approach for tackling many current biomedical related issues (Mamoshina 2016, Xu 2015, Hughes 2015).


Recently, Putin et al. (Putin, 2016) have published promising results demonstrating the capacity of DNN-based methods to accurately predict biological age and identify a set of the most relevant biomarkers for tracking physiological processes related to aging. In their study, the features, a set of 41 biomarkers for each sample, used as inputs for the DNN were extracted from tens of thousands of blood biochemistry samples from patients undergoing routine physical examinations. Although being highly variable in nature, blood biochemistry test is in practice very simple to perform and it is approved for clinical use and as a consequence, commonly used by Physicians. An effective DNN structure was obtained using 56177 samples for the training phase (fitting of hyperparameters) with the remaining 6242 samples used for validation. The interesting results obtained for predicting biological age show that DNN-based approach outperform many traditional machine learning methods including GBM (Gradient Boosting Machine), RF (Random Forests), DT (Decision Trees), LR (Linear Regression), kNN (k-Nearest Neighbors), ElasticNet, SVM (Support Vector Machines).


Furthermore, PFI (Permutation Features Importance) method was used to compute the relative importance of each biomarker used to estimate biological age. This information can be used in two ways. Firstly, as each biomarker aims at measuring a specific biological mechanism, this ranking can be exploited to optimize anti-aging strategies by targeting the most critical biological processes identified as playing a key role in the onset and propagation of aging. Secondly, this list can be used to reduce the number of initial inputs required to generate accurate prediction of biological age. Regarding this second point, the results presented in the study show that although each sample initially contains up to 46 biomarkers, the performance of DNNs remained remarkably stable with an input comprising only the 10 first markers with the highest PFI score. Thus, PFI provide a ranked list of biomarkers that can be used to select the most robust and reliable features for predicting age.


The growing body of evidence on experimental data on life extension of model organisms suggests the feasibility of finding interventions promoting human longevity (Moskalev A 2017). However, the restricted experimental possibilities of studying human aging and overall low translation rate from model organisms to the human clinic in other therapeutic areas (Mak, Evaniew, and Ghert 2014) complicates the search of desirable anti-aging therapies and only a few geroprotectors, anti-aging molecules, shown potential efficacy in humans (A. Aliper et al. 2016; I. Thomas and Gregg 2017; A. M. Aliper et al. 2015).


For the past several decades, research in understanding the molecular basis of human aging has progressed significantly. Changes in gene expression are associated with numerous biological processes, cellular responses and disease states most likely play the crucial role in aging process. (de Magalhaes, Curado, and Church 2009).


Because biological aging is not a single signature, but highly specific in terms or organs, tissues, systems, and other granular aspects of the organism (including humans), an effective and useful biological clock must utilize many biomarkers from many tissues and organs. The following are some preferred examples.


Energy Metabolism:


Glycolysis, glucose oxidation, fatty acids oxidation are main sources of ATP generation, which is crucial for the viability of tissue with high-energy demand, such as muscle tissue, and especially cardiomyocytes. Aging process triggers abnormalities in metabolism and energy homeostasis (Ma and Li 2015), and aging biomarkers specific to energy metabolism are a subject of the present invention.


Hypertension and Hypoxia:


Prostaglandins are critical to regulate vasodilation and vasoconstriction and to maintain vascular homeostasis. Balance of vasodilating and vasoconstricting agents is important to maintain normal vascular function. Aging process shift the balance toward a pro-constrictive agents and hypertension, which is the common vascular complication in elderly (Pinto 2007).


No matter the particular biomarkers being assessed by a biological aging assessment compatible with the current invention, a preferred embodiment of the deep learning computational approach for both the current invention and biological aging assessment is as follows. Firstly, a specific type of DNN called Deep Feature Selection (DFS) is trained on blood gene expression samples using standard backpropagation algorithm. Secondly, the DFS model is applied to select a set of age-related genes using different DNN-based feature selection methods combined into one ensemble model via genetic algorithm.


During the first step, DFS model is trained, for example, on 4000 healthy human blood gene expression samples extracted from GEO (GSE33828). DFS (Li et al.) is type of neural network with several specific characteristics. Firstly, DFS adds a particularly hidden layer, called a weighted layer, which bridges one to one input features with neurons in the weighted layer. After that the neurons in the weighted layer are connected one to many with neurons in first normal hidden layer of deep feed forward multilayer neural network. Secondly, DFS introduces several regularization terms in the neural network loss function. An exemplary final loss function expression is as follows:









min
θ







f


(
θ
)



=


i


(
θ
)


+


λ
1



(




1
-

λ
2


2





w


2
2


+


λ
2





w


1



)


+


α
1



(




1
-

α
2


2






k
=
1


K
+
1











W

(
k
)




F
2



+


α
2






k
=
1


K
+
1











W

(
k
)




1




)




,




where l(θ) is the log-likelihood of data, λ1, λ2, a1 and a2 are regularization terms. K is the number of hidden layers. |w∥22 and ∥w∥1 stand for the l2 and l1 norm for weight in weighted layer, respectively. ∥*∥F stands for the Frobenius norm and ∥*∥1 for the matrix norm. The last two terms are the ElasticNet-based terms that control smoothness/sparsity for weights of weighted layer. They reduce the model complexity and speed up the training. After DFS model was trained the absolute values of the weights in the weighted layer could be used as ranking list for the input features (genes).


During the second step, DNN-based feature selection methods are used to select age-related genes. Each method produces a ranked list of relative importance for each gene. In addition to the ranking of input features available with the DFS model itself, other methods have been applied. This includes the permutation feature importance (PFI) method as previously described in (Putin et al.), the heuristic variable selection (HVS) (Yacoub et al.) and methods based on output derivatives. The notable characteristic of these methods is that they can be applied to already trained DNNs. It is not necessary to iteratively retrain DNNs as required by the forward or backward feature selection methods.


Heuristic Variable Selection (Yacoub et al.) is a zero first order method designed for measuring the relative importance of input features of neural network. The method requires that the set of weight values and information related to the DNN structure as inputs. In a preferred embodiment, the relative importance of each given input feature is computed as follows:







S
i






j

H








(





w

j





i









i



I






w

j






i













k

O







w

k





j









j



H






w

k






j










)






where I, H, O are the number of input, hidden and output layers, respectively. Note wji denotes the weight between neurons j and i. After the training of the DNN and the computation of S for each input feature i, the set of S values can be assembled as a ranked list.


There are various of first order methods to measure the relative importance of an input feature. These methods used either the derivative of the error or the output of the neural network with respect to this input feature to establish the ranked list. An interesting property of the derivative-based methods is that they can be applied to any type of differentiable h are specific to each derivative-based method. The procedure to compute the average relevance of the input feature and how the derivative term is included. Here we consider the long-studied derivative-based methods described in detail in (Dorizzi et al.), (Ruck et al.), (Refenes et al.), (Czernichow et al.). In the following formulas,







d







f
j



(

x
l

)




d






x
i






means an output derivative of unit j of the network with respect to xi in xl point, Fj(xl) in is an output of the network with ul as input, N is the number of samples. If specified, M is a number of outputs of the network, var stands for the variance, q95 or 95% percentile. In the table below the relative importance Si of an input feature i is presented by methods.


The biological aging assessment uses, as an example:


1) The model developed by Ruck et al., which is the following:







S
i

=




l
=
1

N










j
=
1

g













f
j





x
i





(

x
l

)










(2) Refenes et al., have developed three different models:







S
i

=


1
N




var






(

x
i

)



var






(


f


(
x
)


-
y

)







l








(




f




x
i





(

x
l

)


)

2










S
i

=


1

N

1
/
2







(




l







(




f




x
i





(

x
l

)


)


-



j








(




f




x
i





(

x
j

)


)

2



)


1
/
2





l










f




x
i





(

x
l

)












S
i

=


1
N





l












f




x
i






(

x
l

)

·


x
i


f


(

x
l

)













3) The model of Dorizzi et al. takes the following form:







S
i

=


q
95

(










f




x
i





(
x
)




)





4) The model of Czernichow et al. is as follows:







S
i

=





l
=
1


N






(




f




x
i





(

x
l

)


)

2



max


(




l
=
1


N






(




f




x






j




(

x
l

)


)

2


)







The final list of ranked genes is obtained by combining the different lists described above using simple genetic algorithm (GA). In a preferred embodiment, the GA proceeds according to the following.


The initial population of genes is initialized by all feature ranking lists obtained by applying the aforementioned feature selection algorithms on both DNN and DFS models. On each iteration the GA performed 35 crossover operations between its populations and 15 mutation operations, during which random genes were injected in the training of GA. Thus, at each iteration, 50 DNNs were trained. Convergence of the GA was reached after 50 epochs and final gene ranking list was obtained. The best DNN model in the GA got 0.79 of coefficient of determination and 4.2 mean absolute error on validation dataset. On FIG. 3, one can see the performance of the DNN for predicting the age of healthy individuals (Rsq=0.79).


Cellular Life Span, Aging, Tissue-Specific Age Prediction, thus, biological aging assessment compatible with the current invention.


As discussed above, different cell and tissues exhibit different expression patterns, different aging patterns, and different life-spans. This substantial variation means that it is useful to have aging clocks that are specific to different cells, tissues, and organs (Seim, Ma, and Gladyshev 2016). In a preferred embodiment we utilize DNN-based predictors of age trained on 12 tissues and 4 tissue-specific DNN-based predictors of age trained on gene expression profiles of a mononuclear whole blood fraction.


Despite the fact that universal 12-tissues based predictor is trained at the data set with a larger sample size compared to 4 tissues specific deep aging clocks, its prediction performance is significantly worse (11.2 years for best network compared to 6.4, 8.2, 7.8 and 8.3 years for Blood, Brain, Liver and M. Blood-based predictors respectively).


In a preferred embodiment we utilize a DFS algorithm for feature ranking to identify the most important genes in age prediction on the universal 12-tissues based predictor of age as well the 4 tissues specific predictors of age.


In an implementation of the method a universal 12-tissues based predictor is trained on a data set with a larger sample size compared to 4 tissues specific deep aging clocks, its prediction performance is significantly worse (11.2 years for best network compared to 6.4, 8.2, 7.8 and 8.3 years for Blood, Brain, Liver and M. Blood based predictors, respectively).


Data from up to 51,139 samples profiled on a GLP570 microarray platform was used to train and test our DNNs. The GLP570 GEO accession numbers refers to data generated using the common Affymetrix Human Genome U133 Plus 2.0 Array, which covers approximately 47,000 transcripts, although only 12,328 or 12,428 transcripts were used in the study. Data was split into training and test sets with a 90:10 ratio with exact values shown in each results section.


Following on from the successful and highly accurate usage of our DNN to classify sex we then attempted to predict classify based on age of samples. As discussed previously we approached age prediction as a regression-based problem. In a preferred embodiment, 12,328 genes over a total of 20,766 samples were used, 18,261 samples were used to train and 2,505 samples used to test. Our DNN-based age predictor delivered a mean absolute error MAE of 11.46 years, a significant improvement over standard machine learning models, with k-NN coming closest to matching the DNN with a MAE of 14.973 years. A very small increase (0.085) in MAE was observed following DFS for the 1,000 most relevant genes suggesting that there was little extra training capacity in the DNN using selected gene expression dataset.


Since we saw a clear ability to distinguish tissues by our DNN we investigated if the MAE of the age predictor, would change when investigating tissue specific aging. In a preferred embodiment, 12,428 genes were analyzed from 1,853 samples from whole blood (1,733 train, 120 test), 372 from brain (278 train, 49 test), 287 from liver (228 train, 47 test) and 267 mononuclear blood fractions (170 train, 97 test); again using a regression based model. Remarkably, in all cases a significant improvement over the MAE of our general DNN-based age predictor was observed, with whole blood performing especially well generating a MAE of 6.696. Further improvements were seen following DFS, with a particularly large decrease in MAE observed in brain samples (10.788 vs 8.209). In all instances the various DNN outperformed RF, k-NN and LR models often producing an MAE more than 50% smaller. In total, these observations suggest that the transcriptomic aging-clock is regulated in a tissue specific manner.


Multilayer (with 3 or 4 hidden layers) feed-forward neural networks with a standard backpropagation algorithm were used in a preferred embodiment. A Python implementation of the Keras library with Theano backend was used to build and train neural networks and Scikit-learn library to build and train random forest (RF), K-nearest neighbor (k-NN) and linear regression (LR) models. Grid search algorithm was used for hyperparameter optimization in order to achieve the greatest predictive accuracy.


After rounds of optimization, Adam optimizer with Nesterov momentum and learning rate of 0.01 was selected for all models. Rectified linear unit (ReLU) either exponential linear unit (ELU) were selected as activation functions. Mean absolute error (MAE) loss function was used in a regression task of age prediction. For regularization purposes models were trained with a dropout with 20-50% probability after each layer. Performance of the best DNNs were compared to best (with optimized hyperparameters) RF and k-NN algorithms where appropriate. For the purposes of this study we treated the prediction of human age as a regression-based problem as previously discussed (Putin E 2017) therefore age related experiments are also compared against a LR model. All experiments were conducted with 5-fold cross validation by drugs on NVIDIA GTC Titan Pascal with 128 Gb of RAM.


The biological aging clocks as disclosed in the current invention are, not surprisingly, useful and compatible with senescence treatments. The following is such an example.


Recent paper by Petkovich et al, covers the application of epigenetic clocks to evaluate the effectiveness of anti-aging interventions such as caloric restriction and genetic interventions that are known to increase lifespan (Growth hormone knockout and Snell dwarf mice) (Petkovich et. al 2017). Firstly, authors developed epigenetic aging clocks and predicted the age of animals on interventions and matching controls. Mouse on caloric restriction demonstrates the decrease in predicted age compared to actual chronological and compares to the age-matching controls. Snell dwarf mouse demonstrate the greater decrease in the predicted age comparing to the matching controls. Growth hormone knockout also demonstrate younger predicted biological age.


The same suppression age-associated DNA methylation changes were shown for not only for genetic, dietary interventions but also for rapamycin, mTORC1 and mTORC2 inhibitor, that promote healthy aging and extend lifespan (Cole et al. 2017).


Combined inhibition of both mTORC1 and mTORC2 also may provide a promising strategy to reverse the development of senescence-associated features in near-senescent cells (Walters, Deneka-Hannemann, and Cox 2016).


In order to rescue the cells demonstrating pre-senescent phenotype the specific set of possible interventions shall be applied. These interventions include the treatment with the one senoremediator compound or a combination of the senoremediator compounds from the list below.


Activators of PI3K: Insulin receptor substrate (Tyr608) peptide, the sequence is established and known in the art, is from insulin receptor substrate-1 (IRS-1) inclusive of Tyr608 (mouse)-Tyr612 (human). It contains the insulin receptor tyrosine kinase substrate motif YMXM (Tyr-Met-X-Met). This peptide has been used as a substrate for purified insulin receptor (Km=90 μM) and other tyrosine kinases in phosphocellulose binding assays. The tyrosine phosphorylated version of this peptide binds to phosphatidylinositol 3-kinase (PI 3-kinase) SH2 domain and activates the enzyme.


740 Y-P: cell-permeable phosphopeptide activator of PI3K. The PDGFR 740Y-P peptide stimulates a mitogenic response in muscle cells. The ability of the 740Y-P peptide to stimulate mitogenesis is highly specific and not a general feature of a cell permeable SH2 domain binding peptides. See ncbi.nlm.nih.gov/pubmed/9790922.


mTORC1, mTORC2 inhibitors: sapanisertib (Wise-Draper et al. 2017; Moore et al. 2018), dactolisib (Wise-Draper et al. 2017).


Inhibitors of PDH: GSK2334470 (GlaxoSmithKline), MP7 (Merck). (Emmanouilidi and Falasca 2017).


Compounds found based on transcriptional signature analysis according to the procedure described in example 1: Withaferin A, Lavendustin A, Sulforaphane. Senoremediator compounds can be administered orally, by injection, sublingually, buccally, rectally, vaginally, cutaneously, transdermally, ocularly, oticly or nasally or other method.


Example 2

Analysis of Age Predictor Outputs



FIG. 11 illustrates the delta (difference between assigned (predicted) biological age and actual chronological age) bar plots grouped by age ranges for healthy people based on an exemplary validation set as described. Delta demonstrates disagreement between the chronological age and the predicted age. The larger the delta value the large is the disagreement between age values predicted by the model and actual chronological age of individuals. In case of diseases patients, unhealthy aged patients, patients on treatment, the predicted age may significantly differ from their actual chronological age.


Gene expression profiles were collected from the publicly available repositories Gene Expression Omnibus (ncbi.nlm.nih.gov/geo/) and ArrayExpress (ebi.ac.uk/arrayexpress/). Here we present the case studies and example of the analysis of age predictor outputs. Such age predictors can also be used to study age acceleration caused by hazardous environmental exposures or diseases. We analyzed 2 datasets GSE10846, E-MTAB-4015.


We first analyzed the GSE10846 dataset containing the survival, treatment information and gene expression data for 412 patients with diffuse large B cell lymphoma (e.g., disease analysis) and treated with chemotherapy or chemotherapy plus Rituximab. Being predicted by the model younger chronological age is associated with good prognostic.


Patients that were found to have an older transcriptomic-age (e.g., age predicted by the model) than their chronological age had increased risk of dying and vice versa. A younger blood age could, therefore, be a useful outcome measure in interventions for healthy aging.



FIG. 12 shows an example of a biological age clock, or a report thereof. To investigate the predictive ability of deep transcriptomic aging clocks (e.g., biological aging clock) on mortality, we employed chronological age- and sex-adjusted Cox regression models. Samples predicted to be younger than actual age consistently demonstrated a decrease in the hazard ratio (33%), while samples that predicted to be older than actual age demonstrated a significant increase in the hazard ratio (12%). Thus, the hazard ratio can be used in the methods of the present invention.


Analysis of the E-MTAB-4015 dataset of smoking status and health status (e.g., lifestyle analysis) and gene expression data for 211 individuals with Chronic Obstructive Pulmonary Disease (COPD) and without COPD. Tobacco smoking, creates a significant strain on healthcare systems worldwide, as it is a major risk factor for a host of chronic diseases and a potential culprit in premature aging and mortality.



FIG. 13 shows an example of a biological age clock, or a report thereof. The actual and predicted age for current smokers, non-smokers former smokers and individuals with COPD is shown. Non-smokers demonstrated a lower predicted age compared to the current and former smokers and to COPD. Mean predicted age of nonsmokers is 60 years, compared to the mean of 63 years for current smokers and 63 for COPD individuals (p-value<0.05).


It should be recognized that while examples were provided using transcriptomic data, proteomic or DNA methylation data may also be used.


Additionally, DNN predictors of biological age can be based on blood test values, such as the blood protein concentrations. FIG. 15 shows an example of a biological age clock or a report thereof. To investigate the predictive ability of deep proteomic clocks on the efficacy of drugs in diseased patients, we explored the log 2 aging ratios. Blood samples from the group of diabetic patients were used to predict their biological age. In general, all diabetic patients tended to be predicted to have an older biological age compared to their chronological age. The group of patients taking both insulin and glucose-lowering drugs and the group taking only glucose-lowering drugs tend to be predicted younger than their chronological age for male samples. The difference between groups taking both insulin and glucose-lowering drugs (e.g., first group, far left) and taking insulin only (e.g., second group, middle right) is significant, and the first group is predicted younger than the second group. The first group also tends to be predicted to be biologically aged younger than patients taking neither insulin nor glucose-lowering drugs (e.g., third group, nothing, far right). The difference between groups taking only glucose-lowering drugs (e.g., fourth group, middle left) and taking insulin only (e.g., second group) is also significant, and the fourth group is predicted younger than the second group. Additionally, the fourth group also tends to be predicted younger patients taking neither insulin nor glucose-lowering drugs (e.g., third group).



FIG. 16 shows an example of a biological age clock or a report thereof. To investigate the predictive ability of deep proteomic clocks to differentiate aging rates in various populations, we predicted the age of samples from one population using the deep proteomic clock trained on another population (e.g., Eastern Europeans). Samples of a population with higher life expectancy (South Koreans) are predicted younger by the age predictor trained on the population with lower life expectancy Eastern Europeans. After about age 40, the Canadians are predicted to be about the same as the Eastern Europeans.



FIG. 17 shows an example of a biological age clock or a report thereof. To investigate the predictive ability of deep transcriptomic aging clocks (e.g., biological aging clock) on mortality, we employed Kaplan-Meier analysis. Individuals that were predicted to be five years older (>5) than their chronologically, have lower survival probability compared to individuals predicted within error (the absolute difference between actual and predicted age is lower 5 years; −5:5) and individuals predicted younger than they are (the predicted age is lower than chronological age by 5 years or more; <−5). Additional data to support FIG. 17 is provided in the table below.
















Delta
Number
Number
Number
Number


Group
at Risk
at Risk
at Risk
at Risk



















  >5
102
58
30
0


−5:5
2624
1611
714
0


<−5
4086
2666
1119
0



Time 0
Time 500
Time 1000
Time 1500









Methylation Aging Clock—Deep Learning


A DNA methylation (DNAm) aging clock is described, which can be used for the purpose of predicting human age based on molecular-level features. The DNAm aging clock can be created, trained, and used with deep learning, or neural networks, which provides an approach that has been used to construct accurate clocks using blood biochemistry, transcriptomics, and microbiomics data. Accordingly, the described deep learning can perform aging clock analysis with DNA methylation as input data. The DNAm aging clock can be referred to as DeepMAge, which is a neural network regressor trained on 4,930 blood DNA methylation profiles from 17 studies. Its absolute median error was 2.77 years in an independent verification set of 1,293 samples from 15 studies. DeepMAge shows biological relevance by assigning a higher predicted age to people with various health-related conditions, such as ovarian cancer, irritable bowel diseases, and multiple sclerosis.


It is understood that CpG methylation status is a mathematically degenerate data type. There may be countless non-overlapping combinations of CpG sites to serve as the basis of an aging clock. It is still being debated whether all the DNAm clocks correspond to the same function of age or fundamentally different processes.


The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5′→3′ direction. CpG sites occur with high frequency in genomic regions called CpG islands (or CG islands). Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosines. Enzymes that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that is part of a larger field of science studying gene regulation that is called epigenetics.


In some embodiments, DeepMAge omits a linear regression method.


In some embodiments, DeepMAge includes deep learning. In some aspects, the deep learning is performed with a neural network that shows superior accuracy when compared to elastic net solutions, and it shows disease relevance by predicting higher age values for people with various disorders. The improvement is superior to when linear models fail to detect any difference.


In some embodiments, the operation of DeepMAge includes a computing system having a neural network configured for performing an epigenetic dimension analysis of aging that can be integrated with other types of biological information. The model for DeepMAge can be processed as a feature reduction method that compresses large, unrefined vectors into compact latent representations, such as where aging trends are easier to outline. A combination of these representations can be used as an input for a multi-modal aging clock, which can account for multiple aging-related processes. The DeepMAge model can be processed with private or publicly available multi-modal datasets that contain longitudinal data for multiple aging dimension, such as: gene expression values, DNA methylation levels, metabolic profiles, or image data [Zhavoronkov A, et al. (2019). Artificial intelligence for aging and longevity research: Recent advances and perspectives. Ageing Res Rev, 49:49-66.].


In some embodiments, a method of creating a biological aging clock for a subject can include: (a) receiving a DNA methylation data signature derived from a biological sample of the subject, wherein the DNA methylation data signatures includes a plurality of DNA methylation sites; (b) creating input vectors based on the DNA methylation data signature; (c) inputting the input vectors into a machine learning platform; (d) generating a predicted biological aging clock of the subject based on the input vectors by the machine learning platform, wherein the biological aging clock is specific to the subject; and (e) preparing a report that includes the biological aging clock that identifies a predicted biological age of the subject. In some aspects, the method can include Correlating a methylomics profile of the DNA methylation data signature with the predicted biological age of the subject. In some aspects, the method can include: obtaining the biological sample from the subject; and obtaining the DNA methylation data signature by performing a measurement of the methylomics of DNA in the biological sample. In some aspects, the biological aging clock can estimate human age with a MedAE of 2.77 years, or +/−10%. In some aspects, the method can include: performing feature importance analysis for ranking DNA methylation sites by their importance in age prediction by using the biological data; and correlating a biological signaling pathway signature with the predicted biological age of the subject. In some aspects, the machine learning platform includes feed-forward neural networks with more than three hidden layers. In some aspects, the method is performed with a neural network configured for performing an epigenetic analysis with feature selection based on a feature importance analysis. In some aspects, the method is performed with a model that is trained on DNA methylation profiles from a plurality of subjects. In some aspects, the method is performed with a model that is verified by being processed with healthy subjects.


In some embodiments, the methods can include: inputting DNA methylation vectors of the subject into deep neural network model having multiple hidden layers; performing regression calculation; obtaining an age prediction of the subject; and providing the age prediction to the subject. In some aspects, the method can include: training the deep neural network model on the DNA methylation data of the DNA methylation vectors; performing a deep feature selection protocol; performing a gradient-based feature selection protocol; and identifying important features having an importance value over an importance threshold. In some aspects, the methods can include: optimizing model parameters; performing a grid search over model depth of layers; performing an activation function protocol; performing an optimizing algorithm protocol; and performing a regularization algorithm protocol. In some aspects, the method can include: selecting at last one best feature selection protocol; and fixing a set of identified important features.


In some embodiments, a computer program product can include a tangible, non-transitory computer readable medium having a computer readable program code stored thereon, the code being executable by a processor to perform a method for biological aging clock for a patient. The method of creating a biological aging clock for a patient can be performed in accordance with the embodiments described herein.


DeepMAge Performance in Healthy and Ill Individuals


A deep neural network referenced as DeepMAge was trained using a collection of 4930 blood DNAm profiles from control cohorts in 17 studies (per study report containing DeepMAge accuracy, DeepMAge cohort, male ratio and age range, as well as the baseline accuracy, median age assignment). Its MedAE achieved in cross-validation (CV) is 2.24 years (Table 1), control cohorts shown in Table 1A.


Table 1 shows the accuracy metrics for DeepMAge neural network. The accuracy achieved in cross-validation (CV, MedAE=2.24 years) is only slightly reduced during verification (Healthy verification, MedAE=2.77 years). The accuracy drops in the samples with various health-related conditions (Case verification, MedAE=4.35 years). MAE is the mean absolute error, MedAE is the median absolute error, R2 is the coefficient of determination, yrs is years.















TABLE 1









Healthy
Case
Case




CV
verification
training
verification






















MedAE, years
2.24
2.77
3.29
4.18



MAE, years
3.21
3.80
4.74
5.08



R2
0.96
0.93
0.88
0.82



Pearson's r
0.98
0.97
0.94
0.94



RMSE, years
4.55
5.44
7.51
6.24



N
4,930
1,293
1,093
439







CV = Cross-validation;



MAE = Mean absolute error;



MedAE = Median absolute error;



R2 = Coefficient of determination;



RMSE = Root mean square error;



N = Number of samples in the subsample




















TABLE 1A








Male ratio,
Age range,
MedAE,
Baseline,



Study
Cohort
N
%
yrs
yrs
yrs
Platform






















GSE81961
train
40
0
21-43
2.62
3.65
450k


GSE52588
train
58
12
 9-83
2.72
14
450k


GSE52588
case_train
29
62
10-43
2.82
8
450k


GSE97362
train
83
67
 3-19
1.4
3
450k


GSE97362
case_train
150
61
 0-52
3.47
5.5
450k


GSE41037
train
720
62
16-88
2.29
10
 27k


GSE30870
train
39
0
 0-103
2.96
14
450k


GSE61496
verification
310
53
30-74
2.14
16.5
450k


GSE98876
verification
71
100
26-69
2.54
6
450k


GSE37008
verification
99
37
24-45
3.74
4
 27k


GSE128235
train
536
43
18-87
1.99
9
450k


GSE87640
case_verification
156
65
18-63
3.97
8.8
450k


GSE87640
verification
84
62
20-58
2.52
5.05
450k


GSE87582
case_verification
20
90
50-71
4.38
2.81
450k


GSE87582
verification
1
100
60-60
9.59
0
450k


GSE19711
train
272
0
52-78
4.25
6
 27k


GSE19711
case_train
264
0
49-91
3.7
8
 27k


GSE34639
verification
48
33
0-1
1.92
0.5
450k


GSE79329
verification
34
100
43-70
2.63
8.7
450k


GSE67530
train
105
53
22-93
4.43
12
450k


GSE67530
case_train
39
59
22-91
3.43
10
450k


GSE105123
verification
107
58
19-23
2.06
1
450k


GSE99624
case_verification
32
12
50-87
3.92
7.5
450k


GSE99624
verification
16
38
49-82
2.72
2.5
450k


GSE125105
train
688
45
17-87
2.1
11
450k


GSE102177
case_verification
18
61
 4-10
1.84
0.53
450k


GSE102177
verification
18
56
 4-14
1.87
2
450k


GSE20067
case_verification
195
49
24-74
4.99
6
 27k


GSE27044
train
889
100
 3-26
1.08
3
 27k


GSE103911
verification
65
71
27-77
6.96
8
450k


GSE53740
train
197
32
37-93
2.95
7
450k


GSE53740
case_train
186
35
34-91
3.62
4.5
450k


GSE59065
verification
295
48
22-84
4.35
11
450k


GSE112696
case_verification
6
67
18-29
5.51
3
450k


GSE112696
verification
6
67
22-27
3.75
0.5
450k


GSE77696
train
117
88
27-76
4.24
5
450k


GSE77696
case_train
261
96
25-75
4.25
6
450k


GSE58119
train
282
0
50-75
3.89
5
 27k


GSE106648
train
139
25
20-65
2.48
7
450k


GSE106648
case_train
140
30
16-66
1.74
9
450k


GSE77445
train
85
51
18-69
2.7
4
450k


GSE84624
train
24
50
0-5
1.32
0.42
450k


GSE84624
case_train
24
54
0-7
1.27
0.9
450k


GSE107737
case_verification
12
100
18-27
2.46
2
450k


GSE107737
verification
12
100
18-29
3.03
3.5
450k


GSE40279
train
656
48
 19-101
4.25
11
450k


GSE107459
verification
127
0
18-35
1.63
2.72
450k









Testing DeepMAge in control cohorts from 15 independent datasets (1293 samples) showed slightly less accurate results with a MedAE of 2.77 years (FIG. 18, and Table 1A, 1B). FIG. 18 and Tables 1 and 1A show that the DeepMAge accurately predicts chronological age in both healthy individuals and an aggregation of case cohorts from multiple studies. Predictions obtained during cross-validation were used for the “Training” cohort, other cohorts were predicted by the finalized model. The “Training case” cohort refers to the samples that were excluded from training due to coming from unhealthy donors. Similarly, the “Verification” cohort contains only the healthy donors and the “Verification case” contains donors from the same studies that have various conditions. MedAE is median absolute error measured in years, N is the number of donors in a corresponding cohort. FIG. 18 is a Scatter plot of DeepMAge predictions in 4 data cohorts. DeepMAge accurately predicted the chronological age of healthy people from the training set (Training), healthy people from the verification set (Verification), and remained accurate in the aggregations of case cohorts from the studies included in the training set (Training Case) and the verification set (Verification Case). Scatter plot in panel for Training shows the per-fold predictions obtained during CV, and the other panels show the predictions by the final model.


Table 1B shows the 1000 CpG sites comprising DeepMAge with feature importance measures.














TABLE 1B





CpG site
Importance











cg01580888
0.000149323
cg19722847
8.50E−05
cg27015931
7.21E−05


cg21801378
0.000143596
cg27320127
8.47E−05
cg19046959
7.18E−05


cg00343092
0.000143059
cg05675373
8.29E−05
cg08668790
7.05E−05


cg26394940
0.00012294
cg18008766
8.26E−05
cg01511567
6.88E−05


cg12024906
0.000120211
cg24127874
8.09E−05
cg00503840
6.81E−05


cg22736354
0.000119079
cg13663218
7.87E−05
cg20143092
6.77E−05


cg18815943
0.000107798
cg19560758
7.87E−05
cg12373771
6.76E−05


cg13269407
0.000107461
cg11126134
7.77E−05
cg15957394
6.76E−05


cg06493994
0.000106796
cg22407458
7.75E−05
cg18902090
6.72E−05


cg10523019
9.53E−05
cg18691434
7.74E−05
cg15013019
6.70E−05


cg27491887
9.13E−05
cg19761273
7.66E−05
cg03623878
6.66E−05


cg17861230
9.11E−05
cg24891133
7.53E−05
cg18267374
6.64E−05


cg04836038
8.82E−05
cg04528819
7.47E−05
cg02397514
6.57E−05


cg09809672
8.76E−05
cg17285325
7.47E−05
cg15804973
6.53E−05


cg02479575
8.66E−05
cg15319457
7.45E−05
cg16744741
6.53E−05


cg21296230
8.63E−05
cg11668844
7.41E−05
cg25148589
6.44E−05


cg00059225
6.40E−05
cg07850604
7.32E−05
cg18055007
6.42E−05


cg24081819
6.35E−05
cg05436231
5.42E−05
cg06263495
4.81E−05


cg27544190
6.30E−05
cg00987379
5.42E−05
cg12402251
4.81E−05


cg18236477
6.27E−05
cg01820374
5.40E−05
cg09643544
4.80E−05


cg06291867
6.26E−05
cg12238343
5.39E−05
cg26005082
4.80E−05


cg07211259
6.25E−05
cg13975369
5.36E−05
cg16731240
4.77E−05


cg13494498
6.23E−05
cg23887396
5.29E−05
cg25763788
4.75E−05


cg10189695
6.21E−05
cg04662594
5.29E−05
cg14166009
4.75E−05


cg12422450
6.17E−05
cg03330058
5.27E−05
cg02151301
4.74E−05


cg24170090
6.14E−05
cg00930873
5.26E−05
cg26610808
4.71E−05


cg08209133
6.14E−05
cg08468689
5.26E−05
cg10316635
4.71E−05


cg18182399
6.13E−05
cg06836772
5.25E−05
cg22171829
4.70E−05


cg07388493
6.11E−05
cg08694544
5.24E−05
cg17199483
4.69E−05


cg17729667
6.11E−05
cg13931228
5.23E−05
cg13921352
4.67E−05


cg26372517
6.08E−05
cg01530101
5.22E−05
cg21870884
4.66E−05


cg19885761
6.05E−05
cg03975694
5.22E−05
cg13302154
4.65E−05


cg26842024
5.99E−05
cg08317263
5.22E−05
cg07895149
4.64E−05


cg23303074
5.95E−05
cg12339802
5.19E−05
cg07715201
4.64E−05


cg24826867
5.95E−05
cg12946225
5.19E−05
cg01295203
4.64E−05


cg10362475
5.94E−05
cg04431054
5.17E−05
cg16670497
4.62E−05


cg11299964
5.92E−05
cg05135156
5.13E−05
cg16786458
4.62E−05


cg22947000
5.91E−05
cg14918082
5.08E−05
cg13129046
4.61E−05


cg06268694
5.90E−05
cg08965235
5.05E−05
cg23290344
4.61E−05


cg16785344
5.88E−05
cg10947146
5.04E−05
cg04474832
4.59E−05


cg19724470
5.81E−05
cg13460409
5.04E−05
cg22392276
4.58E−05


cg07158339
5.76E−05
cg06156376
5.04E−05
cg15379633
4.58E−05


cg26614073
5.75E−05
cg01899253
5.03E−05
cg19211800
4.57E−05


cg26845300
5.74E−05
cg08695830
5.01E−05
cg20692569
4.56E−05


cg05822532
5.70E−05
cg04872689
5.00E−05
cg22919728
4.54E−05


cg21790626
5.65E−05
cg10734665
4.98E−05
cg26369667
4.51E−05


cg18660898
5.65E−05
cg15361590
4.98E−05
cg27210390
4.51E−05


cg02310296
5.63E−05
cg15201877
4.97E−05
cg09381003
4.51E−05


cg21368354
5.62E−05
cg18992688
4.97E−05
cg02164046
4.51E−05


cg16313343
5.58E−05
cg17051321
4.95E−05
cg25229172
4.50E−05


cg16273597
5.56E−05
cg03664992
4.91E−05
cg13836627
4.50E−05


cg04123409
5.56E−05
cg00565688
4.89E−05
cg12620499
4.49E−05


cg08090640
5.49E−05
cg04425624
4.87E−05
cg13573276
4.49E−05


cg18440048
5.48E−05
cg21256649
4.87E−05
cg17940013
4.48E−05


cg20300246
5.47E−05
cg03734874
4.87E−05
cg24199834
4.47E−05


cg20761322
5.47E−05
cg02844545
4.87E−05
cg04270799
4.45E−05


cg00462994
5.47E−05
cg20125091
4.86E−05
cg08888956
4.44E−05


cg11377136
5.45E−05
cg16516400
4.84E−05
cg23710218
4.43E−05


cg25809905
5.44E−05
cg07408456
4.83E−05
cg11896923
4.42E−05


cg25564800
5.43E−05
cg12145907
4.83E−05
cg02154074
4.40E−05


cg09949775
4.39E−05
cg14754581
4.82E−05
cg21448423
4.40E−05


cg02840794
4.39E−05
cg03996822
4.11E−05
cg23828595
3.75E−05


cg21581873
4.39E−05
cg22730004
4.11E−05
cg14592406
3.75E−05


cg17410236
4.38E−05
cg03336167
4.10E−05
cg10822172
3.75E−05


cg25332298
4.37E−05
cg07703401
4.08E−05
cg05064673
3.75E−05


cg00194146
4.37E−05
cg17339202
4.08E−05
cg09554443
3.74E−05


cg26599006
4.36E−05
cg17497271
4.07E−05
cg05369142
3.74E−05


cg27316956
4.36E−05
cg01405761
4.05E−05
cg17274064
3.73E−05


cg05266781
4.36E−05
cg08900043
4.05E−05
cg23517605
3.73E−05


cg19357849
4.32E−05
cg08529529
4.05E−05
cg21992250
3.73E−05


cg24871743
4.32E−05
cg17471102
4.04E−05
cg20974196
3.72E−05


cg23178308
4.31E−05
cg22892904
4.03E−05
cg11120551
3.72E−05


cg21700166
4.31E−05
cg24968336
3.99E−05
cg11919694
3.69E−05


cg16168311
4.30E−05
cg00236832
3.98E−05
cg14319409
3.68E−05


cg17133388
4.30E−05
cg15898840
3.97E−05
cg16620032
3.67E−05


cg25499099
4.29E−05
cg24471894
3.96E−05
cg19789466
3.67E−05


cg18693704
4.28E−05
cg03991512
3.96E−05
cg25459323
3.66E−05


cg06458239
4.28E−05
cg22285621
3.94E−05
cg19356189
3.66E−05


cg06738602
4.27E−05
cg23843505
3.93E−05
cg03544320
3.66E−05


cg01777397
4.27E−05
cg11378686
3.92E−05
cg02364642
3.64E−05


cg03688818
4.26E−05
cg19515518
3.92E−05
cg18755783
3.64E−05


cg06204948
4.24E−05
cg23211240
3.92E−05
cg03030757
3.63E−05


cg25985778
4.24E−05
cg23189044
3.92E−05
cg09462576
3.63E−05


cg02228185
4.24E−05
cg09118625
3.91E−05
cg05379350
3.63E−05


cg16363586
4.22E−05
cg04765422
3.91E−05
cg05158615
3.60E−05


cg26151675
4.22E−05
cg26911787
3.91E−05
cg24860534
3.60E−05


cg23967169
4.22E−05
cg11536940
3.90E−05
cg16682903
3.60E−05


cg24921089
4.22E−05
cg25822709
3.90E−05
cg02489552
3.60E−05


cg07313155
4.21E−05
cg14456683
3.89E−05
cg22527345
3.59E−05


cg08186362
4.20E−05
cg15297650
3.88E−05
cg20008332
3.59E−05


cg09626984
4.20E−05
cg23587449
3.88E−05
cg05442902
3.59E−05


cg25141674
4.17E−05
cg05881135
3.86E−05
cg21697134
3.58E−05


cg16933388
4.17E−05
cg01283289
3.86E−05
cg04601137
3.57E−05


cg02096633
4.17E−05
cg10549973
3.83E−05
cg24169822
3.57E−05


cg23843812
4.16E−05
cg25256723
3.82E−05
cg27360098
3.56E−05


cg17832674
4.15E−05
cg03929796
3.81E−05
cg01968178
3.55E−05


cg20295671
4.15E−05
cg13854874
3.80E−05
cg02217159
3.55E−05


cg19423311
4.14E−05
cg14332079
3.78E−05
cg13697378
3.55E−05


cg23124451
4.13E−05
cg01946401
3.78E−05
cg25044651
3.54E−05


cg24989962
4.13E−05
cg01294695
3.78E−05
cg16319578
3.54E−05


cg22809047
4.13E−05
cg07123069
3.77E−05
cg09067967
3.54E−05


cg04586023
4.13E−05
cg18573383
3.77E−05
cg12688670
3.54E−05


cg10741760
4.13E−05
cg01400401
3.77E−05
cg03891319
3.53E−05


cg11065385
4.12E−05
cg00047050
3.76E−05
cg18919097
3.53E−05


cg14261309
3.53E−05
cg23506842
3.75E−05
cg09736162
3.53E−05


cg26500816
3.52E−05
cg17791651
3.33E−05
cg16543027
3.18E−05


cg25538571
3.52E−05
cg20979799
3.32E−05
cg21057046
3.18E−05


cg09915099
3.51E−05
cg12365667
3.32E−05
cg09816471
3.18E−05


cg23428445
3.50E−05
cg17031727
3.32E−05
cg10193817
3.18E−05


cg16614500
3.48E−05
cg18059933
3.32E−05
cg25802093
3.17E−05


cg19235307
3.47E−05
cg25947945
3.31E−05
cg01519742
3.16E−05


cg08876932
3.46E−05
cg25766046
3.31E−05
cg12941369
3.16E−05


cg10235817
3.46E−05
cg09427311
3.31E−05
cg25511429
3.16E−05


cg01459453
3.46E−05
cg26304237
3.31E−05
cg09660171
3.15E−05


cg19055231
3.46E−05
cg22747092
3.30E−05
cg22705225
3.15E−05


cg24851490
3.45E−05
cg19713196
3.30E−05
cg15415507
3.15E−05


cg15839448
3.45E−05
cg19402885
3.30E−05
cg03641225
3.15E−05


cg00489401
3.45E−05
cg19310430
3.29E−05
cg14386691
3.15E−05


cg04062391
3.45E−05
cg24653181
3.29E−05
cg08896945
3.14E−05


cg22396353
3.45E−05
cg19945840
3.29E−05
cg25983380
3.14E−05


cg15743985
3.44E−05
cg21870662
3.28E−05
cg22115808
3.14E−05


cg23854009
3.43E−05
cg15903421
3.28E−05
cg18678185
3.13E−05


cg19008809
3.43E−05
cg04289385
3.28E−05
cg11438428
3.13E−05


cg23668631
3.42E−05
cg12870705
3.28E−05
cg27389185
3.13E−05


cg27153400
3.42E−05
cg04329454
3.27E−05
cg00308665
3.13E−05


cg11946503
3.42E−05
cg20158248
3.26E−05
cg10150813
3.13E−05


cg00081975
3.41E−05
cg10319505
3.26E−05
cg06433658
3.13E−05


cg14175438
3.41E−05
cg12078929
3.25E−05
cg12758687
3.13E−05


cg17688525
3.41E−05
cg15377518
3.25E−05
cg09262269
3.12E−05


cg27553955
3.41E−05
cg07099407
3.24E−05
cg13885201
3.12E−05


cg05767404
3.41E−05
cg08570521
3.24E−05
cg18787975
3.12E−05


cg27016307
3.40E−05
cg12261786
3.24E−05
cg20973210
3.12E−05


cg12782180
3.40E−05
cg02789485
3.23E−05
cg06971096
3.11E−05


cg16465939
3.40E−05
cg19759064
3.23E−05
cg15563382
3.11E−05


cg03224418
3.40E−05
cg24384676
3.23E−05
cg10281002
3.11E−05


cg26963271
3.39E−05
cg02564523
3.22E−05
cg15982419
3.11E−05


cg01407797
3.39E−05
cg06810647
3.22E−05
cg15928398
3.10E−05


cg08822227
3.38E−05
cg17207590
3.22E−05
cg17992056
3.10E−05


cg06320982
3.38E−05
cg09072120
3.21E−05
cg11981599
3.10E−05


cg05535113
3.38E−05
cg10927536
3.21E−05
cg00168942
3.09E−05


cg21096915
3.36E−05
cg20264732
3.20E−05
cg25375711
3.09E−05


cg03909500
3.36E−05
cg25282410
3.20E−05
cg12532500
3.09E−05


cg06147863
3.36E−05
cg14859417
3.20E−05
cg10044101
3.09E−05


cg20240860
3.36E−05
cg12774845
3.20E−05
cg00201234
3.08E−05


cg03943081
3.35E−05
cg12741420
3.20E−05
cg07139440
3.08E−05


cg01154193
3.35E−05
cg04424621
3.19E−05
cg22909609
3.08E−05


cg06361108
3.34E−05
cg17878972
3.19E−05
cg20449692
3.08E−05


cg24012925
3.33E−05
cg21530890
3.19E−05
cg15473868
3.08E−05


cg22449114
3.07E−05
cg25166896
3.19E−05
cg02197293
3.07E−05


cg05228408
3.07E−05
cg20994801
2.98E−05
cg16362133
2.90E−05


cg16924616
3.07E−05
cg15156836
2.98E−05
cg07737781
2.89E−05


cg12259537
3.07E−05
cg06269753
2.98E−05
cg11314684
2.89E−05


cg26297688
3.07E−05
cg22680204
2.98E−05
cg14377791
2.89E−05


cg25736482
3.06E−05
cg26036443
2.97E−05
cg19355190
2.89E−05


cg00911351
3.06E−05
cg02828104
2.97E−05
cg11747499
2.88E−05


cg05010623
3.05E−05
cg16270890
2.97E−05
cg13500819
2.88E−05


cg11808757
3.05E−05
cg17324128
2.97E−05
cg06824727
2.88E−05


cg05570980
3.05E−05
cg08303146
2.97E−05
cg00563926
2.88E−05


cg00426498
3.04E−05
cg07195577
2.97E−05
cg08655844
2.88E−05


cg05890019
3.04E−05
cg25713185
2.97E−05
cg07903918
2.88E−05


cg14967066
3.04E−05
cg14826456
2.96E−05
cg04460372
2.87E−05


cg18074297
3.04E−05
cg27169020
2.95E−05
cg16483916
2.87E−05


cg19395441
3.04E−05
cg07430605
2.95E−05
cg11279021
2.87E−05


cg03565323
3.03E−05
cg09492887
2.95E−05
cg11189837
2.87E−05


cg17453778
3.03E−05
cg05010058
2.95E−05
cg27601516
2.87E−05


cg24231716
3.03E−05
cg10226744
2.95E−05
cg24056567
2.86E−05


cg05473871
3.03E−05
cg02206259
2.95E−05
cg20279283
2.86E−05


cg22187630
3.02E−05
cg17471928
2.94E−05
cg16063112
2.86E−05


cg05250458
3.02E−05
cg20637307
2.94E−05
cg24986868
2.86E−05


cg07935568
3.02E−05
cg15037004
2.94E−05
cg00431114
2.86E−05


cg02620013
3.02E−05
cg23833896
2.94E−05
cg00563932
2.86E−05


cg21016177
3.02E−05
cg10865119
2.94E−05
cg19706682
2.85E−05


cg03848555
3.01E−05
cg14865868
2.94E−05
cg15747595
2.85E−05


cg18016365
3.01E−05
cg10281478
2.93E−05
cg16352283
2.85E−05


cg21908259
3.01E−05
cg25942450
2.93E−05
cg26131019
2.85E−05


cg24739326
3.01E−05
cg22613010
2.93E−05
cg06638433
2.84E−05


cg18303397
3.01E−05
cg22901840
2.93E−05
cg00689340
2.84E−05


cg10756887
3.00E−05
cg20001829
2.93E−05
cg27187881
2.84E−05


cg17838026
3.00E−05
cg25604883
2.93E−05
cg11879514
2.83E−05


cg13666340
3.00E−05
cg12513481
2.92E−05
cg13593287
2.83E−05


cg10722799
3.00E−05
cg13899108
2.92E−05
cg06948294
2.83E−05


cg01200177
3.00E−05
cg05871136
2.92E−05
cg03565081
2.83E−05


cg03852144
2.99E−05
cg05483509
2.92E−05
cg06161930
2.82E−05


cg18511007
2.99E−05
cg16254309
2.91E−05
cg11010122
2.82E−05


cg00202702
2.99E−05
cg27281093
2.91E−05
cg24512400
2.82E−05


cg26824091
2.99E−05
cg12556134
2.91E−05
cg16776350
2.82E−05


cg02848777
2.99E−05
cg20900524
2.91E−05
cg23265096
2.82E−05


cg25054311
2.98E−05
cg11584690
2.91E−05
cg00548268
2.81E−05


cg08022502
2.98E−05
cg03600687
2.91E−05
cg12052765
2.81E−05


cg02085507
2.98E−05
cg19283196
2.91E−05
cg25302419
2.81E−05


cg10682057
2.98E−05
cg03883519
2.90E−05
cg18765542
2.81E−05


cg10084993
2.98E−05
cg19594666
2.90E−05
cg21289015
2.81E−05


cg02071305
2.80E−05
cg10515956
2.90E−05
cg20043466
2.80E−05


cg01805282
2.80E−05
cg06238491
2.73E−05
cg01161216
2.64E−05


cg07442479
2.80E−05
cg24587268
2.73E−05
cg08849574
2.64E−05


cg17431739
2.80E−05
cg04880063
2.73E−05
cg00152644
2.63E−05


cg24642523
2.80E−05
cg26711820
2.72E−05
cg17966619
2.63E−05


cg10240853
2.80E−05
cg25655096
2.72E−05
cg26780333
2.63E−05


cg09595479
2.79E−05
cg09601629
2.72E−05
cg20419410
2.63E−05


cg23320649
2.79E−05
cg19233923
2.72E−05
cg20227766
2.63E−05


cg08996521
2.79E−05
cg25629694
2.72E−05
cg24127989
2.63E−05


cg15776355
2.79E−05
cg01654582
2.71E−05
cg23752923
2.63E−05


cg20654468
2.79E−05
cg00340102
2.71E−05
cg15456206
2.63E−05


cg09429111
2.79E−05
cg03826976
2.70E−05
cg24727203
2.62E−05


cg23850212
2.79E−05
cg14870271
2.70E−05
cg04739570
2.62E−05


cg16240480
2.79E−05
cg02654291
2.70E−05
cg05056120
2.61E−05


cg07185695
2.78E−05
cg04036898
2.70E−05
cg17692403
2.61E−05


cg12073594
2.78E−05
cg14992253
2.69E−05
cg17914753
2.60E−05


cg15201635
2.78E−05
cg12613383
2.69E−05
cg27493997
2.60E−05


cg23762517
2.78E−05
cg10917602
2.69E−05
cg02988947
2.60E−05


cg15352829
2.78E−05
cg07652213
2.68E−05
cg02016419
2.60E−05


cg20346726
2.78E−05
cg21820677
2.68E−05
cg10362591
2.60E−05


cg11738543
2.77E−05
cg14681055
2.68E−05
cg22521310
2.60E−05


cg00208967
2.77E−05
cg19635712
2.68E−05
cg06051311
2.60E−05


cg03782453
2.77E−05
cg13726191
2.68E−05
cg02515725
2.60E−05


cg19713460
2.77E−05
cg05164634
2.67E−05
cg22321558
2.60E−05


cg05600717
2.77E−05
cg19155599
2.67E−05
cg07588779
2.60E−05


cg04786857
2.76E−05
cg01269795
2.67E−05
cg09563216
2.60E−05


cg02335441
2.76E−05
cg19764555
2.67E−05
cg06144905
2.59E−05


cg16127845
2.76E−05
cg22236626
2.67E−05
cg09706243
2.59E−05


cg22631938
2.76E−05
cg11260848
2.66E−05
cg01919208
2.59E−05


cg21426387
2.76E−05
cg07621046
2.66E−05
cg11428724
2.59E−05


cg22472290
2.76E−05
cg22719623
2.66E−05
cg12928668
2.59E−05


cg09340639
2.76E−05
cg09083627
2.66E−05
cg00090147
2.59E−05


cg08587864
2.76E−05
cg11833861
2.65E−05
cg00630583
2.59E−05


cg19168338
2.76E−05
cg01580044
2.65E−05
cg14958635
2.59E−05


cg25725843
2.75E−05
cg05546044
2.65E−05
cg26083396
2.59E−05


cg20616414
2.75E−05
cg13745346
2.64E−05
cg20080624
2.58E−05


cg06675478
2.75E−05
cg20831708
2.64E−05
cg08370996
2.58E−05


cg20209009
2.75E−05
cg08555657
2.64E−05
cg23430664
2.58E−05


cg04598121
2.75E−05
cg19573166
2.64E−05
cg19889780
2.58E−05


cg00564163
2.75E−05
cg09325711
2.64E−05
cg24200059
2.58E−05


cg20496643
2.75E−05
cg23239396
2.64E−05
cg14100184
2.58E−05


cg01027739
2.74E−05
cg14155397
2.64E−05
cg13047892
2.58E−05


cg02503850
2.74E−05
cg17029151
2.64E−05
cg04457979
2.58E−05


cg12902039
2.74E−05
cg13620770
2.64E−05
cg14056644
2.57E−05


cg04597449
2.57E−05
cg15974053
2.64E−05
cg19669036
2.57E−05


cg07979752
2.56E−05
cg24076884
2.51E−05
cg21509097
2.46E−05


cg00685836
2.56E−05
cg12955583
2.51E−05
cg18972811
2.46E−05


cg09079275
2.56E−05
cg03760483
2.51E−05
cg00576250
2.46E−05


cg04726200
2.56E−05
cg06392241
2.51E−05
cg09155852
2.46E−05


cg26673195
2.56E−05
cg14913925
2.51E−05
cg02254649
2.46E−05


cg12069309
2.56E−05
cg24429836
2.51E−05
cg07495664
2.46E−05


cg23283875
2.56E−05
cg23758485
2.50E−05
cg24450312
2.46E−05


cg02994956
2.55E−05
cg07846167
2.50E−05
cg15271616
2.46E−05


cg21480743
2.55E−05
cg22101147
2.50E−05
cg26968812
2.45E−05


cg11896271
2.55E−05
cg19728223
2.50E−05
cg05786809
2.45E−05


cg02181506
2.55E−05
cg07469792
2.50E−05
cg11469321
2.45E−05


cg00497251
2.55E−05
cg13311440
2.49E−05
cg07558455
2.45E−05


cg21808053
2.55E−05
cg07482936
2.49E−05
cg16761581
2.45E−05


cg15316334
2.55E−05
cg24646414
2.49E−05
cg07314414
2.45E−05


cg16408970
2.55E−05
cg26928682
2.49E−05
cg03945800
2.45E−05


cg15261665
2.54E−05
cg16386080
2.49E−05
cg26512148
2.44E−05


cg05373457
2.54E−05
cg03547797
2.49E−05
cg23047271
2.44E−05


cg25483003
2.54E−05
cg06630241
2.49E−05
cg02774439
2.44E−05


cg01114088
2.54E−05
cg08097882
2.48E−05
cg06621358
2.43E−05


cg19037167
2.54E−05
cg08646988
2.48E−05
cg05898524
2.43E−05


cg02255609
2.54E−05
cg17830308
2.48E−05
cg01346152
2.43E−05


cg11648289
2.54E−05
cg20028470
2.48E−05
cg20557202
2.43E−05


cg09582042
2.54E−05
cg15720535
2.48E−05
cg17943999
2.43E−05


cg21353232
2.54E−05
cg21604042
2.48E−05
cg00398048
2.43E−05


cg26018901
2.53E−05
cg24801210
2.48E−05
cg08441806
2.43E−05


cg21818252
2.53E−05
cg14973995
2.48E−05
cg12600197
2.43E−05


cg14348532
2.53E−05
cg02776251
2.48E−05
cg00187380
2.43E−05


cg13565157
2.53E−05
cg10104451
2.48E−05
cg16998353
2.43E−05


cg02764611
2.53E−05
cg15945417
2.48E−05
cg26509022
2.43E−05


cg05488632
2.53E−05
cg17589341
2.48E−05
cg04466273
2.43E−05


cg21120249
2.53E−05
cg06253072
2.47E−05
cg14093936
2.43E−05


cg08569678
2.53E−05
cg24173049
2.47E−05
cg00472814
2.43E−05


cg26624134
2.53E−05
cg02062650
2.47E−05
cg27236973
2.43E−05


cg13163729
2.52E−05
cg03138091
2.47E−05
cg23786576
2.42E−05


cg07753644
2.52E−05
cg07973967
2.47E−05
cg12457773
2.42E−05


cg06154570
2.52E−05
cg15853125
2.47E−05
cg09863772
2.42E−05


cg05294243
2.52E−05
cg06236061
2.47E−05
cg26209676
2.42E−05


cg22580512
2.52E−05
cg18555440
2.47E−05
cg10194829
2.42E−05


cg00107187
2.52E−05
cg00282347
2.46E−05
cg21073927
2.42E−05


cg22971191
2.52E−05
cg11223252
2.46E−05
cg27626102
2.42E−05


cg22436229
2.52E−05
cg01017147
2.46E−05
cg21402071
2.42E−05


cg01600189
2.52E−05
cg06117855
2.46E−05
cg17165284
2.42E−05


cg00651216
2.51E−05
cg24768561
2.46E−05
cg16332577
2.41E−05


cg17421623
2.41E−05
cg02276665
2.46E−05
cg14540297
2.41E−05


cg21974766
2.41E−05
cg05769161
2.38E−05
cg04409945
2.33E−05


cg02196655
2.41E−05
cg08572611
2.38E−05
cg08654655
2.32E−05


cg26202340
2.41E−05
cg26270746
2.37E−05
cg21176048
2.32E−05


cg26374101
2.41E−05
cg06911084
2.37E−05
cg12331389
2.32E−05


cg11480873
2.41E−05
cg18678763
2.37E−05
cg27631256
2.32E−05


cg07349094
2.41E−05
cg10989517
2.37E−05
cg18081258
2.32E−05


cg15364618
2.41E−05
cg16721845
2.37E−05
cg07991621
2.32E−05


cg25050026
2.41E−05
cg07845392
2.37E−05
cg22799850
2.32E−05


cg05724065
2.40E−05
cg13438834
2.37E−05
cg08097755
2.32E−05


cg10175795
2.40E−05
cg16284292
2.36E−05
cg24874111
2.31E−05


cg17338403
2.40E−05
cg04887278
2.36E−05
cg08587542
2.31E−05


cg05001145
2.40E−05
cg13904493
2.36E−05
cg25713309
2.31E−05


cg17169998
2.40E−05
cg05924583
2.36E−05
cg01353448
2.31E−05


cg13234863
2.40E−05
cg24125648
2.36E−05
cg20506783
2.31E−05


cg05868799
2.40E−05
cg01655355
2.36E−05
cg04588079
2.31E−05


cg21949781
2.40E−05
cg03775422
2.36E−05
cg26898166
2.31E−05


cg17252960
2.40E−05
cg01441777
2.36E−05
cg05157725
2.31E−05


cg13548361
2.40E−05
cg20723355
2.36E−05
cg08197122
2.31E−05


cg15003434
2.40E−05
cg01791232
2.36E−05
cg00565075
2.31E−05


cg10287137
2.40E−05
cg22215728
2.36E−05
cg10331779
2.30E−05


cg08724517
2.40E−05
cg24207176
2.36E−05
cg02782630
2.30E−05


cg27376271
2.40E−05
cg13262687
2.36E−05
cg20083676
2.30E−05


cg03379131
2.40E−05
cg12564453
2.36E−05
cg12478185
2.30E−05


cg26261431
2.39E−05
cg11296937
2.36E−05
cg05824484
2.30E−05


cg21547708
2.39E−05
cg14972143
2.35E−05
cg24641352
2.30E−05


cg11368643
2.39E−05
cg11041457
2.35E−05
cg08162780
2.30E−05


cg16474696
2.39E−05
cg24107665
2.35E−05
cg02260587
2.30E−05


cg23748737
2.39E−05
cg00653387
2.35E−05
cg24649713
2.30E−05


cg19464016
2.39E−05
cg05073035
2.35E−05
cg20051033
2.30E−05


cg23002907
2.39E−05
cg16404106
2.35E−05
cg05697231
2.30E−05


cg16427670
2.39E−05
cg16954341
2.35E−05
cg21092687
2.30E−05


cg06385087
2.39E−05
cg21926138
2.35E−05
cg14244577
2.30E−05


cg10648908
2.38E−05
cg02755525
2.34E−05
cg14329157
2.30E−05


cg18464137
2.38E−05
cg26093148
2.34E−05
cg18809289
2.29E−05


cg06288351
2.38E−05
cg03889226
2.34E−05
cg13150977
2.29E−05


cg04114315
2.38E−05
cg16984944
2.34E−05
cg10986043
2.29E−05


cg04032226
2.38E−05
cg14913610
2.34E−05
cg21152671
2.29E−05


cg23146358
2.38E−05
cg10893437
2.34E−05
cg26984624
2.29E−05


cg11108890
2.38E−05
cg13526007
2.34E−05
cg24101578
2.29E−05


cg11158729
2.38E−05
cg16718678
2.33E−05
cg20716064
2.29E−05


cg10080004
2.38E−05
cg19596204
2.33E−05
cg02994974
2.29E−05


cg10052840
2.38E−05
cg06885782
2.33E−05
cg17655614
2.29E−05


cg00399483
2.38E−05
cg05507459
2.33E−05
cg22799321
2.28E−05


cg17775235
2.28E−05
cg19192120
2.33E−05
cg16413777
2.28E−05


cg21972382
2.28E−05
cg00582628
2.27E−05
cg10521852
2.26E−05


cg10064162
2.28E−05
cg05194726
2.27E−05
cg11386746
2.26E−05


cg08858521
2.28E−05
cg24715735
2.27E−05
cg13806135
2.26E−05


cg24596472
2.28E−05
cg04587910
2.27E−05
cg21053529
2.26E−05


cg16774604
2.28E−05
cg17241310
2.27E−05
cg00650762
2.25E−05


cg10106284
2.28E−05
cg14380517
2.27E−05
cg22183706
2.25E−05


cg18993334
2.27E−05
cg23771661
2.27E−05
cg20537629
2.25E−05


cg16519321
2.27E−05
cg13818573
2.26E−05
cg08331960
2.25E−05




cg26581729
2.26E−05









The prediction distribution for samples from the verification set (except for people over 70 years old) closely resembled the actual age distribution (FIG. 18A). FIG. 18A shows that the DeepMAge prediction age distribution in the verification set closely resembled the real age distribution. Distributions were obtained using Gaussian kernel with 0.36σ bandwidth, where σ is the standard deviation of the age values.


DeepMAge accurately reproduces the age distribution of our verification set, save for the individuals older than 70 years (FIG. 19). FIG. 19 shows that the DeepMAge prediction age distribution in the verification set closely resembles the real age distribution, except for the above 70 years section, where the number of the elderly individuals is significantly underestimated.



FIGS. 20A-20D show that all DeepMAge predictions per study. DeepMAge accurately predicts the age of the healthy blood samples in all studies from the training and verification set. FIGS. 21A-21B show that DeepMAge accurately predicts the age of the blood samples in the case-control studies from the training and verification set.


Most surprisingly, the DeepMAge predictions for the aggregated case cohorts were almost as accurate as for the healthy cohort. Case cohorts from the studies used in the training sample displayed a MedAE of 3.29 years, while the MedAE for the case cohorts in the verification sample was 4.18 years (FIG. 18 and FIGS. 21A-21B).


No significant differences between male and female absolute error distributions were detected with an MW test on the total samples. When age groups from the verification set were tested separately, significant sex-related differences in the 55-65 and 65-75 age groups were detected (Table 1C and FIG. 18B). The mean errors found for women in these age ranges were higher (p-value<0.05), while the ages of 65-75-year-old women were predicted almost 2 years more accurately in absolute terms (p-value<0.01). These findings in the verification set went against the error distributions in the training set and thus were probably due to sample bias rather than any biologically significant factors.












TABLE 1C








Error, years
Absolute Error, years




















Set
Years
(20-45)
(45-55)
(55-65)
(65-75)
(20-75)
(20-45)
(45-55)
(55-65)
(65-75)
(20-75)
N






















Verification
Male
+0.48
−2.50
−1.46*
−4.76*
−0.87*
+2.97
+4.04
+3.98
+6.04*
+3.68
574



Female
+0.23
−3.58
−0.06*
−1.78*
−0.12*
+3.24
+4.48
+3.50
+4.13*
+3.40
494



N
707
62
163
136
1068
707
62
163
136
1068



CV
Male
+0.62
+2.14*
+0.62*
+0.81*
0.97*
+2.84
+3.80
+4.00
+4.89
+3.53
1452



Female
+0.65
+0.41*
−0.54 *
−2.17*
−0.34*
+2.76
+3.59
+3.77
+4.58
+3.59
2058



N
1323
670
897
620
3510
1323
670
897
620
3510









We then further inspected the specific studies with a case-control setting. Comparing the average prediction errors of the case and control cohorts, DeepMAge reacted only to certain conditions (Table 2). Out of 12 such studies, only five showed significantly elevated prediction errors for the case cohorts. In the study on tauopathic frontotemporal dementia and palsy, cases were 1.00 years older than controls. People with inflammatory bowel diseases (IBD) were predicted by DeepMAge to be 1.23 years older than controls. Women with ovarian cancer were predicted to be 1.70 years older. Multiple sclerosis patients were predicted to be 2.10 years older. People with congenital CHARGE and Kabuki syndromes were quite interestingly predicted to be 5.28 years younger than controls. Congenital hypopituitarism was associated with predictions 5.64 years older than predictions for controls. These results may indicate a faster pace of aging in people with these pathologies (except for CHARGE and Kabuki syndromes).


When the protocol compared the average prediction error between the case and control cohorts, DeepMAge reacted only to certain conditions and not the others (Table 2). Out of twelve data sets only five show significantly elevated prediction error for the case cohorts. These data sets include research projects on tauopathic frontotemporal dementia and palsy, ovarian cancer, multiple sclerosis, CHARGE and Kabuki syndromes, and congenital hypopituitarism. These results may indicate a faster pace of aging in people with these pathologies.


















TABLE 2






Mean
Mean










error
error
P-
P-value








in
in
value
(random
N
N
N
DeepMAge
Case


Study ID
control
cases
(MW)
MW)
control
case
total
sample
description
























GSE53740
−0.37
+0.63
2.70E−2
1.50E−1
197
186
383
Training
Neurode-











generative











tauopathy


GSE19711
−2.97
−1.27
9.84E−6
4.39E−1
272
264
536
Training
Ovarian











cancer


GSE77696
+4.43
+3.96
1.31E−1
5.29E−2
117
261
378
Training
HIV


GSE106648
−1.84
+0.26
2.17E−8
2.52E−1
139
140
279
Training
Multiple











sclerosis


GSE67530
−2.66
−1.63
1.12E−1
1.01E−1
105
39
144
Training
Acute











Respiratory











Distress











Syndrome


GSE52588
0.67
1.19
1.71E−1
4.84E−1
58
29
87
Training
Down











syndrome


GSE97362
1.24
−4.04
2.05E−3
9.30E−2
83
150
233
Training
CHARGE /











Kabuki











syndrome


GSE84624
0.54
0.73
4.39E−1
9.87E−2
24
24
48
Training
Kawasaki











disease


GSE112696
4.24
4.56
3.44E−1
1.89E−1
6
6
12
Verification
Myasthenia











gravis


GSE102177
1.99
1.91
4.94E−1
2.38E−1
18
18
36
Verification
Maternal











gestational











diabetes


GSE87582
−9.59
−3.79
1.08E−1
2.82E−1
1
20
21
Verification
HIV


GSE107737
−1.98
3.66
3.63E−3
1.56E−1
12
12
24
Verification
Congenital











hypopituitarism


GSE87640
−0.20
1.03
1.24E−3
3.57E−1
84
156
240
Verification
Inflammatory











Bowel











Diseases


GSE99624
−1.58
−3.99
6.43E−2
3.76E−1
16
32
48
Verification
Ostheoporosis










pvalue (MW) is the significance of the MW test for equal mean prediction error between the case and control cohorts in each study; “*” marks the studies with a significant (p-value<0.05) MW test result; p-value(random MW) is the significance of the test for a permuted sample. For the control samples marked as “Training,” the predictions were obtained during CV; for the case samples marked as “Training,” the predictions were obtained with the final model, which had not been previously exposed to these samples. The studies in which the studied condition was significantly associated with higher DeepMAge predictions are marked in underlining. CV=Cross-validation; GEO ID=Gene Expression Omnibus accession number; HIV=Human Immunodeficiency Virus; MW=Mann-Whitney U test; N=Number of samples in the corresponding GEO project cohorts.


Table 2 shows data for five diseases (such as ovarian cancer and multiple sclerosis) have been associated with significantly higher age predictions (p-value(MW)<0.05N is the number of people in a study; p-value(MW) is the significance of the Mann-Whitney test for equal prediction error distributions between the case and control cohorts in each study; p-value(random MW) is the significance of the test for a permuted sample. Green marks the studies where the studied condition is significantly associated with higher DeepMAge predictions.


Comparison to the 353 CpG Aging Clock


To gain more insight into deep learning offering benefits compared to shallow models, the published 353 CpG clock was used to predict age for the data sets that were used for the DeepMAge neural network. The accuracy reported in its publication is a MedAE of 3.56 years, which is close the metric that was reproduced on our data collection (MedAE=3.51, Tables 3, S4). In this respect DeepMAge significantly outperforms the 353 CpG clock with a MedAE of 2.24 yrs achieved during CV and 2.77 years during verification (Table 1).


The correlation between the 353 CpG clock predictions and DeepMAge within the verification set are significantly high (Pearson's r=0.96, 1293 donors) for the healthy verification cohort. The same is observed in the case samples present within the training studies set (Pearson's r=0.96, 1093 donors).


Two studies used in our verification cohort were actually used for verification in the original 353 CpG clock publication as well: GSE34639 (48 donors) and GSE37008 (99 donors). In these two studies the 353 CpG clock shows superior performance compared to DeepMAge (Table 3).














TABLE 3











Age
Male


Dataset
MedAE, yrs
Pearson's r

range,
ratio,














GEO ID
DeepMAge
353 CpG
DeepMAge
353 CpG
N
yrs
%

















GSE107459
1.63
3.43
0.79
0.68
127
18-35
0


GSE102177
1.87
1.33
0.86
0.83
18
 4-14
56


GSE34639
1.92
0.22
0.89
0.88
48
0-1
33


GSE105123
2.06
2.87
0.47
0.38
107
19-23
58


GSE61496
2.14
3.42
0.97
0.95
310
30-74
53


GSE87640
2.52
3.02
0.86
0.87
84
20-58
62


GSE98876
2.54
4.77
0.89
0.81
71
26-69
100


GSE79329
2.63
3.74
0.92
0.89
34
43-70
100


GSE99624
2.72
3.73
0.93
0.81
16
49-82
38


GSE107737
3.03
3.62
0.34
0.46
12
18-29
100


GSE37008
3.74
2.26
0.81
0.81
99
24-45
37


GSE112696
3.75
2.78
0.34
0.23
6
22-27
67


GSE59065
4.35
5.01
0.95
0.94
295
22-84
48


GSE103911
6.96
6.14
0.85
0.76
65
27-77
71


GSE107459
1.63
3.43
0.79
0.68
127
18-35
0


GSE87582
9.59
6.41


1
60
100


Average
2.77
3.51
0.97
0.93
1293
 0-84
52









Table 3 shows that in 8 out of 16 verification studies DeepMAge shows better performance than the 353 CpG clock, according to two quality metrics. Overall, in seven out of the 15 datasets we compared, DeepMAge showed superior performance according to both MedAE and Pearson's r. In 13 out of 15 studies DeepMAge performed better according to at least one metric used. There are only 2 studies in which DeepMAge is not superior to the 353 clock according to at least one metric. When the 16 studies are considered in aggregate, DeepMAge has superior prediction accuracy. MedAE is the median absolute error, N is the number of people, yrs is years. The metrics of the better model in each row are highlighted green.


We then examined the other verification data sets we had, which were not used in Horvath's original paper. In 8 out of 12 data sets we compared, DeepMAge showed superior performance according both to MedAE and Pearson's r (Table 3).


In certain cases DeepMAge is more sensitive to donor conditions than the 353 CpG clock. GSE87640 contains healthy donors (84) and donors with Inflammatory Bowel Diseases (IBD, 156 donors)—ulcerative colitis and Crohn's disease. DeepMAge predicts the IBD cohort to be significantly (p-value<0.001) older than the healthy cohort, with the delta being 1.2-1.8 years, depending on whether MAE or MedAE is used (FIG. 22). See also Table 2. This difference is not observed in the 353 CpG clock predictions (delta MAE=0.3 yrs, p-value=0.21). FIG. 22 shows that DeepMAge, but not the 353 CpG clock, predicts donors with IBD (GEO study accession GSE87640) to be on average 1.2 years older than the healthy donors from the same study (p-value=1.24E−3). Outliers outside the (−20;+20) prediction error window have been removed from the image. IBD is inflammatory bowel diseases; N is the number of people in a corresponding cohort; GEO is Gene Expression Omnibus; The box is formed by the interquartile range with the median marked inside it. Whiskers protrude no farther than 1.5 times the interquartile range.


Unlike the 353 CpG aging clock, DeepMAge predictions are not affected by donors' sex: there is no significant difference between male and female predictions in the verification cohort for DeepMAge. Meanwhile, the 353 CpG aging clock predicts males to be on average 1.42 years older than females (p-value=1.2E−8).


We also compared the 353 CpG aging clock and DeepMAge in the context of obesity's effect on aging. For this task we used data from GSE37008, which contained 97 individuals with a wide range of BMI (from 16.17 to 36.26 kg/m2). We used Ordinary Least Squares regression to see if the effect of BMI on the predicted age is significant or not. Age, predicted age and BMI were scaled to fit a linear model: [Prediction˜Actual_Age+Is_Male+BMI] (FIG. 23). FIG. 23 shows the scaled BMI effect on age prediction, as observed in [Predicted˜Real Age+Sex+BMI] OLS linear regression. Data set used: GSE37008. Body mass index (BMI) has a significant (p-value=0.048) effect on the predicted age for DeepMAge, but not for Horvath's aging clock (p-value=0.19). BMI regression coefficient for DeepMAge predictions is positive with p-value=0.048. Meanwhile, the positive coefficient for the 353 CpG aging clock has p-value=0.19 and is much less likely to significantly affect age prediction. This difference in sensitivity towards BMI may indicate that DeepMAge recognizes increased body weight as an aging factor. It should be noted, however, that neither the 353 CpG aging clock, nor DeepMAge showed significant BMI effect in another data set with 107 individuals—GSE105123. It may be attributed, however, to a much narrower range of BMI values in this study: from 19.8 to 25.1 kg/m2.


DeepMAge uses a set of 1000 CpG sites, 121 of which are shared with the 353 CpG clock, and 7—with the 71 CpG clock (FIG. 24, Table 1B). The DeepMAge clock shares 122 CpGs with the 353 CpG clock and 7 CpGs with the 71 CpG clock, both published in 2013.


The genes covered by the DeepMAge CpG sites have been inspected to see if the selected features are enriched in specific pathways. In a Gene Ontology biological function annotation 289 terms were identified as significantly (FDR<0.01) enriched. Among the most abundant terms are generic regulatory and signaling terms. among the 289 enriched terms 146 are related to tissue development and organ morphogenesis, 57—to neural development, neurogenesis and synaptic signaling, 14—to circulatory system development and function, 14—to cell differentiation, and proliferation (including that of stem cells), 10—to cross-membrane ion transport, 9—to cell motility, 9 enriched terms correspond to transcription, 5—to locomotion. Top ten most significantly enriched terms (minimal p-value of 1.76E−14) include 4 terms related to neural function and 5 terms related to organism development.


Accordingly, the deep learning DNAm aging clock DeepMAge is shown to be a better age predictor based on the data. Also, DeepMAge is shown to be an improvement over Horvath's aging clock. As shown, DeepMAge can estimate human age with a MedAE of 2.77 years, as demonstrated by a verification set containing 1293 samples. We find that DeepMAge is more accurate than the 353 CpG clock in predicting the age of healthy individuals, which displays a MedAE of 3.51 years on the same dataset.


Having obtained the deep learning age predictor its biological relevance was shown in several settings. DeepMAge produces significantly higher predictions (by 1.2 years on average) for people with IBD, compared to healthy people. This difference is not observed in the 353 CpG clock predictions. Some other diseases that may be expected to affect the pace of aging produce similar results, e.g. multiple sclerosis and ovarian cancer (Table 1C). Using a data set from our verification cohort we also managed to establish higher BMI as a factor contributing to higher predicted age (FIG. 23)—a finding not supported by the 353 CpG clock.


DeepMAge uses a set of 1,000 CpG sites, of which 121 are shared with the 353 CpG clock and seven are shared with the 71 CpG clock. Genes where DeepMAge CpGs are located are enriched with those taking part in developmental (especially cardio and neurodevelopmental) processes. We hypothesize that this may be attributed to the antagonistic pleiotropy theory of aging. According to this theory, genes required for the earlier stages of development may sustain their activity beyond their appropriate period of expression. This non-specific activity harms the organism and leads to multiple downstream aftereffects that ultimately manifest as aging.


As shown, DeepMAge is a deep learning DNAm aging clock that performs better than shallow models in certain aspects. Neural networks can be used to explore individual DNAm landscapes in the context of aging and estimate the risks of certain age-related conditions in future given a single observation. Other uses may include aggregating multiple sources of age-related information, including DNAm profiles, to gain a systemic view on an individual's aging process.


Methods


The DeepMAge study was carried out using publicly available data sets were collected from the publicly available Gene Expression Omnibus repository (ncbi.nlm nih.gov/geo/).


Overall 32 studies were used with 6411 DNAm profiles in total. Among these 17 studies and 4930 samples were included in the training set. The other 15 studies and 1293 profiles were used in the verifications set. Samples annotated to be in the case cohorts of their studies were explored separately. All metrics for both the verification and the training sets are calculated using only the samples marked as control cohorts in the repository.


The exact study identifiers of the training set are: GSE106648, GSE125105, GSE128235, GSE19711, GSE27044, GSE27097, GSE30870, GSE40279, GSE41037, GSE52588, GSE53740, GSE58119, GSE67530, GSE77445, GSE77696, GSE81961, GSE84624, GSE97362. The exact study identifiers of the verification set are: GSE102177, GSE103911, GSE105123, GSE107459, GSE107737, GSE112696, GSE34639, GSE37008, GSE59065, GSE61496, GSE79329, GSE87582, GSE87640, GSE98876, GSE99624. All the data used in this study had been obtained with the Illumina Infinium Human Methylation BeadChip and 450K and 27K platforms from blood samples. Only those studies with age metadata and raw files available were selected.


The data was downloaded as either raw intensities or iDAT files. LUMI R package v2.38.0 was used for intra-study color correction and normalization [Du P, Kibbe W A, Lin S M (2008) lumi: A pipeline for processing Illumina microarray. Bioinformatics]. Only 24538 CpG sites shared between the 450 k and 27 k platforms were used, minus sex chromosome sites and sites with orthologous sequences on multiple chromosomes.


Approximately 17% of samples used in this project were associated with integer age values. Such samples have de facto understated chronological age. To avoid introducing this bias to the model, 0.5 years counts were added to the integer ages. Float age values had no counts added to them.


We used the 353 regression coefficients (plus intercept) published in the original paper by Horvath [Horvath S (2013) DNA methylation age of human tissues and cell types. Genome Biol 14:R115] to reconstruct the linear regression model. The model was then used to estimate the logarithmically transformed age, as described in [Horvath (2013)].


The reverse transform used is:





Age=21×ExpPrediction−1, if Prediction≤0





Age=21×Prediction+20, if Prediction>0


Additionally a de novo elastic net model was trained using a protocol from [Horvath (2013)]. The script we used can be found in the Supplementary section.


To compare DeepMAge and Horvath's clock MedAE and MAE metrics are used most frequently in this article. Lower values of these metrics is The formulas for these are as follows:





MedAE=Median(|Agetrue,i−Agepredicted,i|)

    • for all i∈(1, N), where N is the total number of sample








M

A

E

=


1
N






i
=
1

N






Age

true
,
i


-

A

g


e

predicted
,
i









,






    • where N is the total number of samples





DeepMAge Model:


The DeepMAge model was prepared as follows. We performed age prediction as a regression task, when the model takes DNAm beta vectors as input and outputs a continuous age value. To allow fitting the data with high dependencies, we used a deep neural network model with multiple hidden layers. In particular, we used feed-forward neural networks with more than three hidden layers.


Due to input's high dimensionality (the original data has 24′538 features) feature selection was applied before the final model training. First, a neural network was trained on the original data, then deep feature selection [Li Y, Chen C-Y, Wasserman W W (2016) Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters. J Comput Biol 23:322-336] and gradient-based feature selection methods [Leray P, Gallinari P (1999) Feature Selection With Neural Networks. Behaviormetrika] were applied to find the most important features in terms of model output impact. To optimize model parameters, we used grid search over the model depth (from two to five hidden layers), the count of neurons per each hidden layer (from 128 to 1024), the activation function (ELU [Clevert D A, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (ELUs). In: 4th International Conference on Learning Representations, ICLR 2016—Conference Track Proceedings], RELU, SELU [Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Advances in Neural Information Processing Systems 1), the optimizing algorithm (Adam [Kingma D P, Ba J L (2015) Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings] Amsgrad [Reddi S J, Kale S, Kumar S (2018) On the convergence of Adam and beyond. In: 6th International Conference on Learning Representations, ICLR 2018—Conference Track Proceedings] and Nadam [Dozat T (2016) Incorporating Nesterov Momentum into Adam. ICLR Work Dozat T (2016) Incorporating Nesterov Momentum into Adam. ICLR Work]) and the regularization algorithm: dropout [Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res] (with rate from 0.15 to 0.5) and L2 regularization (with L2 coefficient from 1e−6 to 0.1). Next, the best feature selection method was identified in terms of the target metric—Mean Absolute Error (MAE). Finally, 1000 most important features were fixed using an algorithm that calculates 95-th percentile of the gradients module based on the model input, input neurons (with corresponding input features) with the greatest gradients modulus being the most important [Leray P (1999)].


The final model was trained using 1000 most important features. To optimize model parameters, we used grid search with the same grid parameters as the previous one. We minimized the MAE loss function using a back propagation algorithm. After the optimization procedure, the best model had exponential linear unit (elu) function applied after each layer, Adam as the optimizer of the cost function with a learning rate of 10−4, a 30% dropout probability at each layer and L2 with 1e−3 coefficient for the purposes of regularization. The final best neural network model consists of 4 hidden layers with 512 neurons each.


The accuracy metrics for model performance include Mean Absolute Error (MAE), Median Absolute Error (MedAE), Person's r, Root Mean Squared Error and R2. These metrics were calculated using Python3.6 sklearn.metrics (v.0.22.1) [Pedregosa F, Varoquaux G, Gramfort A, et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825-2830] and scipy.stats (v.1.4.1) [Virtanen P, Gommers R, Oliphant T E, et al (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat Methods] packages.


We trained the networks with fivefold cross-validation (CV) to compensate for overfitting and to receive more robust performance metrics in both cases: feature selection and the final model. The Python version of the Keras library (keras io) with TensorFlow (tensorflow) backend for neural network implementation was used. All experiments were conducted using an NVIDIA GeForce 1080Ti graphics processing unit.


The accuracy metrics for model performance included MAE, MedAE, Pearson's r, RMSE, and coefficient of determination (R2). These metrics were calculated using the Python 3.6 sklearn.metrics (v.0.22.1; scikit-learn) and scipy.stats packages (v.1.4.1). The Mann-Whitney U test (MW test) for estimating the significance of differences in sample means was imported from the scipy.stats package (v.1.4.1). Pathway enrichment was performed using the Gene Ontology web resource (geneontology). To estimate the effect of body mass index (BMI) on age prediction, the Python statsmodels.regression. linear_model.OLS class from statsmodels (v0.11.0; statsmodels.org) was used. Data visualization was conducted with Plotly (v.4.5.0) for Python and Seaborn (v.0.10.0).



FIG. 26 shows a method 200 of creating the DeepMAge model. The method 200 includes steps for performing age prediction as a regression task in which the model takes DNAm beta vectors as input and then outputs a continuous age value. This can include inputting DNAm beta vectors into the system (block 202), then performing a regression (block 204), and obtaining age prediction as age value. To allow fitting the data with high dependencies, we used a deep neural network model 208 with multiple hidden layers 210. In particular, we used feed-forward neural networks with more than three hidden layers.


The input data of DNAm beta vector had high dimensionality of the input (the original data included 24,538 features). The method 200 includes performing a feature selection protocol, which was applied before training of the final model. First, original data is provided as DNA methylation data (block 212), which can optionally be conditioned by one or more data conditioning processes, such as those described herein or generally known. The deep neural network (DNN), optionally with multiple perception layers (MPL), is also provided (block 213).


A neural network was trained on the original data (block 214), then deep feature selection protocol was performed (block 216) and gradient-based feature selection protocol (block 218) was performed. The protocols were performed to find the most important features in terms of impact on model output (block 220). A second neural network (e.g., DNN, with MLP) can be trained using the identified most important features (block 225). To optimize model parameters (block 222), a grid search was performed was over the model depth (from two to five hidden layers) (block 224), a neuron count per hidden layer (from 128 to 1,024) was performed (block 226), an activation function (exponential linear unit—ELU, rectified linear unit—ReLU, scaled exponential linear unit—SELU) was processed (block 228), an optimizing algorithm (Adam, Amsgrad, and Nadam) was applied (block 230), and a regularization algorithm was applied (block 232): dropout with rate from 0.15 to 0.5) and L2 regularization (with L2 coefficient from 1e−6 to 0.1). Next, the best feature selection method was identified in terms of the target metric, i.e., MAE, (block 324). Finally, the 1,000 most important features were fixed using an algorithm that calculates the 95th percentile of the gradients moduli based on the model input and input neurons (with corresponding input features), with the greatest gradients modulus being the most important (block 236). The model is then obtained according to the foregoing. The model can include an arbitrary number of important features, such as 1,000 or any other number. The selected features are provided in Table 1B along with their importance.


The DeepMAge model was trained using the 1,000 most important features as in Table 1B showing the CpG cites. The CpG cites are known cites, whereby now it is shown these CpG cites can be used for a DNA methylation age clock. To optimize model parameters, the training uses grid search with the same grid parameters as in the previous search. The training minimized the MAE loss function using a backpropagation algorithm. After the optimization procedure, the best model had the ELU function applied after each layer; Adam as the optimizer of the cost function with a learning rate of 104; a 30% dropout probability at each layer; and L2 regularization with a 10−3 coefficient. The final best neural network model consisted of four hidden layers with 512 neurons each.


As shown, the DeepMAge DNN can be used to reliably estimate the age of a person with no registry or identification. The DNN model that takes in a DNAm profile obtained from blood cells via an array platform can be configured to output that person's age with 3-4 years error. Now, the DeepMAge DNN can be configured to understand that a person is healthy/unhealthy for their age based on data. The error of the DeepMAge DNN is shown to have association with multiple sclerosis, ovarian cancer, obesity, neurodegenerative tauopathy, inflammatory bowel diseases and some other conditions. In all these cases people with the condition are predicted on average 1-2 years older than healthy controls from the same cohorts. Thus, the people with age prediction higher than their actual age are assumed to be unhealthy for their age. The DeepMAge DNN can be configured to assess the risk of the aging related diseases' onset. We hypothesize on the previous point that even if a person is considered healthy, but their predicted age is elevated, this indicates a higher risk of being afflicted by an aging related disease. The DeepMAge DNN can be configured to create an accurate age predictor for DNA methylation (DNAm) data type. We have provided reproducible methods that cover the data preprocessing, model training and model verification stages of the project.



FIG. 27 shows a method 300 for training DNA methylation age clock. The method can include features and steps of FIG. 26. The method can include obtaining DNA methylation data (block 302). This can include obtaining iDAT or raw intensity files from an Illumina BeadChip array paired with age data, or any data type that can be processed into CpG methylation levels that are in the [0;1] range. Array technology is the most widely used technology and we used it to train the DeepMAge model, since the public data was the most abundant. Then, the method can add 0.5 years pseudocounts to the whole-years age of each sample (e.g., person) for the data (block 304). This can add 0.5 yrs pseudocounts to the whole-years age values, but do not add the counts to the precise age values. This is done to attenuate the effect slight right censoring might have on the training. If a person is specified to be 25 years, he is actually in the 25;26 years range with the expected precise age being 25.5 years.


The method can optionally include normalization and color correction (block 306), which can by using Lumi for R. This step is not essential. We have trained a model omitting this stage and it worked suitably. However, it can be standard practice to normalize and color-correct DNAm data.


The method can optionally process the DNA methylation data to refine data (block 308). This can include selecting the overlapping CpGs for the platforms used, remove the CpG sites at sex chromosomes, and remove CpG sites that map several times to the human genome. This is not an essential step, but provides a primitive feature selection step that removes the features that do not behave like the others.


The method can include preparing a training data set and a verification data set with the DNA methylation data (block 310). This can aggregate all the remaining data into the training set and the verification set. Then the training data set can be provided (block 312). The deep neural network is then provided as described herein (block 314). This can include preparing a set of multilayer perceptron (MLP) architectures, which are a type of DNN architecture. This allows for processing the DNN with DNAm data. Multiple architectures are used to find the best possible solution. The set of MLPs is generated using grid search.


The method can train (e.g., pretrain) the DNN with the DeepMAge model using the DNA methylation data training set (block 341). This can pretrain the MLPs using the training set and select the best performing architectures according to a cross-validated accuracy metric. The metric we used is Mean Absolute Error (MAE), other popular metrics are Median AE (MedAE, frequently abbreviated as MAE as well), Mean Squared Error (MSE), Root MSE (RMSE), coefficient of determination (R2, Rsq, R squared), Pearson's r.


After the neural network was trained on the original data (block 314), then deep feature selection protocol was performed (block 316) and gradient-based feature selection protocol (block 318) was performed. The protocols were performed to find the most important features in terms of impact on model output (block 320). This can be used to establish the most important features using deep feature selection (DFS) and gradient-based feature selection. DFS is one of multiple ways to rank features according to their “importance” to the model. Importance can be defined in at least as many ways as there are quality metrics. The specific algorithms to measure how changes in a feature affect its “accuracy” are innumerable. The top ranked important features are then selected (block 320). For example, this can include selecting the top-1000 most important features (E.g., DNA methylation sites). The number being 1000 (N=1000) was used in the examples, but it can be any other arbitrary number. Altogether, steps in blocks 313 to 320 may be modified as needed. In some aspects, these steps can be refined to reduce the high dimensionality inherent to the DNAm data type, train one model (not even necessarily a DNN one) and use its top features to train another model.


The important features from block 320 are then used to train a second deep neural network (block 322). The second trained DNN is improved by the selected important features obtained from the first trained DNN. This can be done by repeated the method in blocks 313 and 314. The step to train another MLP can be performed using only the top-1000 features (or other arbitrary number, >100, >500, >750, >1000, >1500, etc.) optimize network parameters using grid search (block 326), using MAE as the target metric (block 328).


Then, the optimized model is verified with the verification data set (block 330), which is by predicting the age of the samples in the verification set. This can include predicting the age for the verification set samples and report MAE. The verification set should not have been observed by any other DNN by this point. It needs to be an independent data set from training so that the age prediction capability of the trained DeepMAge DNN. When verification is successful by the trained DeepMAge DNN predicting the ages in the verification data set, the trained DNN can be provided (block 332). This trained DNN can then be used for any sample subject for age prediction using DNA methylation data.


Once the trained DNN is provided, new data (e.g., DNA methylation data without associated age) can be provided (block 334) and used to predict the age of one or more subjects of the DNA methylation data (block 336). This trained DNN is the second trained DNN, which can be considered the DeepMAge DNN. The DeepMAge can then be used for predicting ages of other data sets. The predicted ages can then be provided with the MAE, or other error parameter (block 338). The trained network can be used to predict age in other datasets. That is, any DNA methylation data can be used to predict the age of the subject providing the DNA methylation data. The use of DNA methylation can be for all possible applications. That is, any biological sample from any person can be obtained and processed with the trained DNN to predict the age of that person.



FIG. 28 includes a method 400 of obtaining the DNA methylation age clock. The method can include: collecting DNAm data (block 401); preparing the DNAm data into a training set (block 402) and a test set (block 404); pretraining first DNN (DNN1) with the training set (block 406); selecting top features (e.g., 1000, or arbitrary number) (block 408); reducing the number of features to the arbitrary number (block 410); training second DNN (DNN2) with the reduced number of features (e.g., important features) (block 412); and verifying the second DNN (DNN2) with the test set of data (block 414).


Some examples of the uses of the DNA methylation age clock are provided below.















Forensics
Identify the age of a blood sample on a crime scene to narrow down the



list of suspects


Insurance
Use the predicted age to calculate the premiums: give discounts based on



how young a client is perceived by the model


Public Healthcare
Use accumulated prediction error and its statistics (e.g. yearly rate of


Systems
change) as a metric of public health to track policy efficiency. E.g.



sample blood from a representative group before implementing



guidelines and check 5 years later. Measure the prediction error in the



group to see if the guidelines had a beneficial effect in terms of aging.


Biometric
Use one of the hidden layer's output of the DNN model as personal


identify-cation
barcodes. Provided DNAm profiles are stable enough on the scale of its



barcode usage period, these barcodes can be used to identify a person.



For example, this can be used to build an anonymized healthcare service,



where barcode plus nonce's hash is used as a password to decrypt



sensitive information (such as genomic data and analysis report). The



barcodes hashes are used as personal ID codes in the system. Only a



person who knows the barcode will be able to decrypt the report, the



barcode being registered verifies that this person is the intended recipient



of the sensitive information.


Clustering
Latent representations obtained with the DNN can be used to reduce the


algorithms
high dimensionality of DNAm profiles. This is extremely useful for



clustering and classification tasks that perform poorly in high



dimensional settings.



For example, in a short term clinical trial patients receive an



experimental longevity intervention. The trial is too short, and the



cohorts are too small to reliably perform the standard statistical tests.



However, it may be possible to cluster DNAm profiles' latents in a way



that shows that the control group is different compared to the target



group. In this case the aging-related effect of the intervention cannot be



dismissed and the trial design may be adjusted to make the standard tests



applicable.


Visualization
PCA, tSNE and SOM are popular ways to collapse high dimensional data



to 2 dimensions for visualization purposes (or subsequent clustering).



Applying them to the original data may be too time consuming or simply



fail-no interesting dependencies are visible among the 2D projections.



This might change if the latent representations are used instead of the



original data.


Drug design
The predicted age can be used to estimate the aging-related effect of a



drug. This information can be used to assign molecules a score which



will be used to derive new molecules.


Tontine-like
A possibly illegal insurance scheme with no intermediaries. Pay outs can



be multiplied by a statistic based on a person's prediction error. For



example, if the people with the lowest prediction scores receive a bigger



pay out share, the tontine will provide financial incentive to prevent



aging in its participants.


Recommendation
A person's DNAm profile may be adjusted to emulate anti-aging


engines
interventions. The changes in the predicted age for the new profiles will



tell which intervention is the best for this person.









Other examples can also be used. For example, the protocol can collect a blood sample at a crime scene and estimate its donor age to narrow down the list of suspects. Blood DNA methylation from a potential client of an insurance firm can be analyzed to adjust their premium: increase it if the predicted age is too high (higher risk of a payout event), or offer a discount if it is below a threshold (and thus indicates superior healthiness). In a similar fashion, the DNA methylation data and DeepMAge can be used to determine the payouts in a tontine-like “insurance” program. DNA methylation data from a clinic patient can be obtained during a routine check-up to evaluate the patient's need for an advanced check-up, if their predicted age is too high (an indication of sub symptomatic conditions). Also, blood collected during a clinical trial of a longevity drug can be analyzed with the DeepMAge DNN to estimate its efficiency: if the target group's predicted age is significantly lower than that of the control group—the drug is rendered effective.


We have shown that our DNN approach produces more accurate models, compared to shallow approaches. We used the Elastic Net (EN) method previously described by Horvath in 2013 and our DNN method on the same data set to illustrate that our model is significantly more accurate by a measure of 0.5 years (in terms of MedAE);


Altogether, DNNs offer more room for downstream experimentation than any shallow models. In short, DNNs operate by “compressing” the initial vector of methylation levels into a vector of N dimensions, where N is the number of neurons in the first hidden layer. This process repeats several times until the last hidden layer is compressed into a single predicted age value. The intermediate vectors can be used as so-called “latent representations” and treated formalistically. As such, they can be added to or multiplied by other vectors, which are constructed to emulate the effect of health conditions or therapeutic intervention. In this case it will be possible to see how the predicted age is affected by them.


The latent representations can also be used as individual compact DNAm barcodes, for example, to identify people.


Latent representations are also useful as a starting point in classification or clustering tasks. It is impractical to run a computation heavy clustering algorithm on vectors with 25 k dimensions (input DNAm data), the algorithm converges much faster when there are just 512 dimensions.


Although we demonstrate only an MLP implementation, its layers and latent representations can be “plugged into” other types of DNN for extended functionality (variational, generative DNNs, autoencoder, etc.).


Although all aging clocks aim to be disease-relevant, and some existing solutions have been proven to overestimate the age of ill people, it is important to point out once again that our DNN methylation aging clock is disease-relevant for multiple conditions, such as ovarian cancer and multiple sclerosis.


While Horvath's aging clock is a well-known frame of reference for age predictors, it is not sufficient to show the extra benefits deep learning offers compared to the shallow machine learning techniques. Horvath's DNAm clock was trained on a different data collection and the original paper suggests training models from scratch for new data sets.


Thus to show DeepMAge superiority relative to other algorithms, we reproduced an elastic net aging clock as described in Horvath's original paper, using the same data as for DeepMAge. The resulting model contains 348 CpGs, 75 of which overlap with the 353 CpGs originally described by Horvath. We then verified the obtained shallow predictor in the verification set to see that both its MAE=4.24 and MedAE=3.23 are inferior to those of DeepMAge (MAE=3.80, MedAE=2.77). The differences between MAEs is deemed significant with p-value=0.0001 (FIG. 25). FIG. 25 shows boxplots for absolute prediction errors in the DeepMAge and the de novo elastic net regressor, reproduced according to Horvath's protocol. DeepMAge's MAE (3.80 years) is significantly (p-value=0.0001) lower than that of the elastic net (4.24 years).


For processes and methods disclosed herein, the operations performed in the processes and methods may be implemented in differing order. Furthermore, the outlined operations are only provided as examples, and some operations may be optional, combined into fewer operations, eliminated, supplemented with further operations, or expanded into additional operations, without detracting from the essence of the disclosed embodiments.


The figures provided herein are examples of reports or can be included in reports of the biological aging clock. The reports can be provided to the subject or a medical professional, such as the subject's doctor.


In some embodiments, the biological data signature is based on genomics, transcriptomics, proteomics, methylomics (e.g., DNA), metabolomics, lipidomics, glycomics, or secretomics. In some aspects, the method includes obtaining biological sample of the cell, fluid, tissue or organ of the subject; and obtaining the biological data by performing a measurement of the genomics, transcriptomics, proteomics, methylomics, metabolomics, lipidomics, glycomics, or secretomics. In some aspects, the biological data signature is based on a simulation by a computer program for genomics, transcriptomics, proteomics, methylomics, metabolomics, lipidomics, glycomics, or secretomics. In some aspects, the biological data is an omics signature of biological data. In some aspects, the omics signature is genomics, transcriptomics, proteomics, metabolomics, methylomics, lipidomics, glycomics, or secretomics.


The use of genomics, transcriptomics, DNA methylomics, and proteomics (e.g., biological data signatures) in the present protocols for determining biological aging clocks and other protocols are described above. These protocols can also be applied to other biomarkers or other omics, where the omics may be considered to also be biomarkers.


Genomics is the study of the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. As such, genomics provides the biological data signature for use in preparing the biological aging clocks and other protocols described herein. The genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Accordingly, the genomics biological data signature can provide significant information. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes.


Transcriptomics is the study of the transcriptome, which is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription. The study of the transcriptome can provide biological data signatures for the cells, tissues, or organs or the overall organism. This data can be used as described herein.


Proteomics is the study of proteins in the proteome, which can obtain a biological data signature of the proteins in cells, fluids, tissues, organs, or a subject. The proteome is the entire set of proteins that is produced or modified by an organism or system. Proteomics has enabled the identification of ever increasing numbers of proteins, and protein levels. The protein signature varies with time and distinct requirements, or stresses, that a cell or organism undergoes.


Methylomics is a study that involves the analysis of methylome, which includes nucleic acid modification of the organism's genome. Methylation leads to epigenetic modifications of DNA and so reduction of gene expression and consequently protein synthesis. Such epigenetic modifications are involved in the regulation of many biological processes inside cells including aging. Decreased methylation is associated with aging of tissue and cells. Methylation data gives biological data signatures, which can be used in biological aging clocks and other protocols described herein. DNA methylomics is the study of methylation of DNA at specific sites, such as the CpG cites or CG cites. Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosines. Enzymes that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression.


The metabolomics includes the study of chemical processes involving metabolites, the small molecule substrates, intermediates and products of metabolism. Specifically, metabolomics is the systematic study of the unique chemical fingerprints that specific cellular processes leave behind, the study of their small-molecule metabolite profiles. As such, metabolomics can be studied to obtain a signature from a cell, fluid, tissue or organ of a subject. The metabolome represents the complete set of metabolites in a biological cell, tissue, organ or organism, which are the end products of cellular processes. The mRNA gene expression data and proteomic analyses reveal the set of gene products being produced in the cell, data that represents one aspect of cellular function. Conversely, metabolic profiling and obtaining biological data signatures thereof can give an instantaneous snapshot of the physiology of that cell, and thus, metabolomics provides a direct functional readout of the physiological state of an organism. This biological data signature of metabolomics can provide for the information for creating the biological aging clocks and other protocols as described herein. Also, the protocols can be used to integrate genomics, transcriptomic, proteomic, and metabolomic information to provide a better understanding of cellular biology and creation of the biological aging clock and other protocols.


The lipidomics is the study of pathways and networks of cellular lipids in biological systems, which can provide a biological data signature of the lipids. The word lipidome is used to describe the complete lipid profile within a cell, tissue, organism, or ecosystem and is a subset of the metabolome, which also includes the three other major classes of biological molecules: proteins/amino-acids, sugars and nucleic acids. Lipidomics is can be assessed by techniques such as mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy, fluorescence spectroscopy, dual polarization interferometry and computational methods. Also, the biological data signature of the lipidomics can be used for determination of a biological aging clock due to the role of lipids in many metabolic diseases such as obesity, atherosclerosis, stroke, hypertension and diabetes.


The glycomics is the study of glycomes, which includes the entire complement of sugars, whether free or present in more complex molecules of an organism, including genetic, physiologic, pathologic, and other aspects. Glycomics is the systematic study of all glycan structures of a given cell type or organism and is a subset of glycobiology. Accordingly, glycomics gives biological data signatures of the glycan structures, which can be used in the protocols and biological aging clocks described herein. The term glycomics is derived from the chemical prefix for sweetness or a sugar, “glyco-”, and was formed to follow the omics naming convention established by genomics (which deals with genes) and proteomics (which deals with proteins).


Secretomics is a study that involves the analysis of the secretome, which includes all the secreted proteins of a cell, tissue or organism. Secreted proteins are involved in a variety of physiological processes, including cell signaling and matrix remodeling, but are also integral to invasion and metastasis of malignant cells. Secretomics has been especially important in the discovery of biomarkers for cancer and understanding molecular basis of pathogenesis. Accordingly, secretomics can be used to obtain a biological data signature for the cells, fluids, tissues, organs, and organisms, which can be useful for determining biological aging clocks and other protocols described herein.


The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its spirit and scope. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those enumerated herein, are possible from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. The present disclosure is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.


In one embodiment, the present methods can include aspects performed on a computing system. As such, the computing system can include a memory device that has the computer-executable instructions for performing the methods. The computer-executable instructions can be part of a computer program product that includes one or more algorithms for performing any of the methods of any of the claims.


In one embodiment, any of the operations, processes, or methods, described herein can be performed or cause to be performed in response to execution of computer-readable instructions stored on a computer-readable medium and executable by one or more processors. The computer-readable instructions can be executed by a processor of a wide range of computing systems from desktop computing systems, portable computing systems, tablet computing systems, hand-held computing systems, as well as network elements, and/or any other computing device. The computer readable medium is not transitory. The computer readable medium is a physical medium having the computer-readable instructions stored therein so as to be physically readable from the physical medium by the computer/processor.


There are various vehicles by which processes and/or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and that the preferred vehicle may vary with the context in which the processes and/or systems and/or other technologies are deployed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.


The various operations described herein can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one embodiment, several portions of the subject matter described herein may be implemented via application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and/or firmware are possible in light of this disclosure. In addition, the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing medium used to actually carry out the distribution. Examples of a physical signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive (HDD), a compact disc (CD), a digital versatile disc (DVD), a digital tape, a computer memory, or any other physical medium that is not transitory or a transmission. Examples of physical media having computer-readable instructions omit transitory or transmission type media such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communication link, a wireless communication link, etc.).


It is common to describe devices and/or processes in the fashion set forth herein, and thereafter use engineering practices to integrate such described devices and/or processes into data processing systems. That is, at least a portion of the devices and/or processes described herein can be integrated into a data processing system via a reasonable amount of experimentation. A typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems, including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities). A typical data processing system may be implemented utilizing any suitable commercially available components, such as those generally found in data computing/communication and/or network computing/communication systems.


The herein described subject matter sometimes illustrates different components contained within, or connected with, different other components. Such depicted architectures are merely exemplary, and that in fact, many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable”, to each other to achieve the desired functionality. Specific examples of operably couplable include, but are not limited to: physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.



FIG. 14 shows an example computing device 600 (e.g., a computer) that may be arranged in some embodiments to perform the methods (or portions thereof) described herein. In a very basic configuration 602, computing device 600 generally includes one or more processors 604 and a system memory 606. A memory bus 608 may be used for communicating between processor 604 and system memory 606.


Depending on the desired configuration, processor 604 may be of any type including, but not limited to: a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or any combination thereof. Processor 604 may include one or more levels of caching, such as a level one cache 610 and a level two cache 612, a processor core 614, and registers 616. An example processor core 614 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. An example memory controller 618 may also be used with processor 604, or in some implementations, memory controller 618 may be an internal part of processor 604.


Depending on the desired configuration, system memory 606 may be of any type including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 606 may include an operating system 620, one or more applications 622, and program data 624. Application 622 may include a determination application 626 that is arranged to perform the operations as described herein, including those described with respect to methods described herein. The determination application 626 can obtain data, such as pressure, flow rate, and/or temperature, and then determine a change to the system to change the pressure, flow rate, and/or temperature.


Computing device 600 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 602 and any required devices and interfaces. For example, a bus/interface controller 630 may be used to facilitate communications between basic configuration 602 and one or more data storage devices 632 via a storage interface bus 634. Data storage devices 632 may be removable storage devices 636, non-removable storage devices 638, or a combination thereof. Examples of removable storage and non-removable storage devices include: magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media may include: volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.


System memory 606, removable storage devices 636 and non-removable storage devices 638 are examples of computer storage media. Computer storage media includes, but is not limited to: RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information and which may be accessed by computing device 600. Any such computer storage media may be part of computing device 600.


Computing device 600 may also include an interface bus 640 for facilitating communication from various interface devices (e.g., output devices 642, peripheral interfaces 644, and communication devices 646) to basic configuration 602 via bus/interface controller 630. Example output devices 642 include a graphics processing unit 648 and an audio processing unit 650, which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 652. Example peripheral interfaces 644 include a serial interface controller 654 or a parallel interface controller 656, which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 658. An example communication device 646 includes a network controller 660, which may be arranged to facilitate communications with one or more other computing devices 662 over a network communication link via one or more communication ports 664.


The network communication link may be one example of a communication media. Communication media may generally be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media. A “modulated data signal” may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR), and other wireless media. The term computer readable media as used herein may include both storage media and communication media.


Computing device 600 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that includes any of the above functions. Computing device 600 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations. The computing device 600 can also be any type of network computing device. The computing device 600 can also be an automated system as described herein.


The embodiments described herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules.


Embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.


Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.


With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.


It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation, no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general, such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”


In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.


As will be understood by one skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to,” “at least,” and the like include the number recited and refer to ranges which can be subsequently broken down into subranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 cells refers to groups having 1, 2, or 3 cells. Similarly, a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.


From the foregoing, it will be appreciated that various embodiments of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various embodiments disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.


Definitions

A “biopsy” is a medical test involving extraction of sample cells or tissues for examination, and can be analyzed chemically. When only a sample of tissue is removed with preservation of the histological architecture of the tissue's cells, the procedure is called an incisional biopsy or core biopsy. When a sample of tissue or fluid is removed with a needle in such a way that cells are removed without preserving the histological architecture of the tissue cells, the procedure is called a needle aspiration biopsy.


“Senescence” is biological aging, that is, the gradual deterioration of function and ability in almost all life forms, mostly after maturation and in particular multi-cellular life. Senescence increases mortality. Senescence refer to cellular senescence, tissue senescence, organ senescence, and senescence of the whole organism. Cellular senescence largely underlies organismal senescence. The boundary between disease and senescence as organisms, tissues, and cells, may have characteristics of both, as disease and senescence are often associated with each other.


“Cellular senescence” is not the aging of an individual cell, but instead, the state (gene expression) of a cell with respect to the senescence of its tissue or organism, in comparison to a less senescent tissue or organism. Cell senescence may partly be the result of telomere shortening cells, which may trigger a DNA damage response. Cells can also be induced to senesce via DNA damage in response to elevated reactive oxygen species, activation of oncogenes, cell-to-cell fusion, and other causes. As such, cellular senescence represents a change in “cell state” rather than a cell becoming “aged” The number of senescent cells in tissues rises substantially during normal aging. Cells may also experience “replicative senescence”, in which they can no longer divide. There is a “senescence associated secretory phenotype” (SASP) associated with senescent cells, which is associated with, for example, an increase in inflammatory cytokines, growth factors, and proteases. Cellular senescence contributes to age-related diseases, such as atherosclerosis.


“Fibrosis” is the accumulation of excess fibrous connective cells or other similarly stiff, structural cells, called “fibrotic cells” in an organ or tissue. Such fibrosis can be a normal, functional part of the reparative process (such as scarring) but can also be pathological. Excess and unnecessary fibrosis is associated with senescence, typically decrease flexibility and other function of a tissue or organ. Fibrotic cells generally have an excess of extracellular matrix proteins which contribute to their stiffness.


A “senolytic” is a drug of other treatment that can selectively induce death of senescent cells.


A “senoremediator” is a drug of other treatment that can restore or increase the number of presenescent or nonsenescent cells.


“Machine learning” (ML) is a subfield of computer science that gives computers the ability to learn without being explicitly programmed Machine learning platforms include, but are not limited to naïve bayes classifiers, support vector machines, decision trees, and neural networks.


“Artificial neural networks”, also called “ANNs” or just “neural networks”, are based on a large collection of connected simple units called artificial neurons loosely analogous to axons in a biological brain. If the combined incoming signals are strong enough, the neuron becomes activated and the signal travels to other neurons connected to it. The activation function of such neurons is often, though not always, represented as a sigmoid function.


“Deep learning” (DL) (also known as deep structured learning, hierarchical learning or deep machine learning) is the study of artificial neural networks that contain more than one hidden layer of neurons. Such a neural network is called a “deep neural network”. A “convolutional neural network” is a type of neural network in which the connectivity pattern is inspired by the organization of the animal visual cortex.


“Principal component analysis” (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of variables into a set of values of linearly uncorrelated variables called principal components. The transformation is defined in such a way that the first principal component has the largest possible variance and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.


“Generative adversarial networks” (GANs) are neural networks that are trained in an adversarial manner to generate data mimicking some distribution. A discriminative model is a model that discriminates between two (or more) different classes of data, for example a convolutional neural network that is trained to output 1 given an image of a human face and 0 otherwise. A generative model by contrast generates new data which fits the distribution of the training data. GANs are well known in the art, as described, for example, in (2) Goodfellow et. al., “Generative Adversarial Networks”, arXiv:1406.2661v1, 2014.


An “autoencoder” is a neural network architecture generally used for unsupervised learning of efficient coding. An autoencoder learn representations (encodings) for a set of data, often for the purpose of dimensionality reduction. An “adversarial autoencoder” (AAE), is an autoencoder that uses generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. AAEs are well known in the art, as described, for example, in Makhzani et. al., “Adversarial Autoencoders”, arXiv:1511.05644v2, 2015. Application of AAEs to new molecule development such as drugs is also well-known in the art, as described, for example, in Kadurin, et. al., “The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology”, Oncotarget, 2017, Vol. 8, (No. 7), pp: 10883-10890.


Feature importance is a statistical method to evaluate the importance of input features for the prediction of the output target. Mainly feature importance methods are including but not limited to the ensemble-based wrapper methods called Permutation Features Importance (PFI). First, a model is train on the feature set, then a vector of feature of interest randomly shuffled and used for training the same model. Then a score of before and after randomly shuffling model compared and a relative importance score is assigned to the vector of interest.


Deep feature selection (DFS) is a method proposed in 2016 by Wasserman et al. (Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters. (Li Yl, Chen C Y, Wasserman W W, J Comput Biol. 2016 May; 23(5):322-36. doi: 10.1089/cmb.2015.0189. Epub 2016 Jan. 22). Method is based on the deep neural network that can select features at the input layer of the neural network.


Support Vector Machine is a discriminative classifier that given labeled training data the algorithm outputs an optimal hyperplane which categorizes new data points/examples.


All references recited herein and/or recited in the provisional applications 62/536,658 filed Jul. 25, 2017 and/or 62/547,061 filed Aug. 17, 2017 are incorporated herein by specific reference in their entirety.


REFERENCES



  • Buzdin, et. al., US 2017/0073735

  • Goodfellow et. al., “Generative Adversarial Networks”, arXiv: 1406.2661v1, 2014.

  • Makhzani et. al., “Adversarial Autoencoders”, arXiv:1511.05644v2, 2015.

  • Kadurin, et. al., “The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology”, Oncotarget, 2017, Vol. 8, (No. 7), pp: 10883-10890.

  • Seim et. al., “Gene expression signatures of human cell and tissue longevity”, npj Aging and Mechanisms of Disease, 2, 16014 (2016).

  • Ozerov, U.S. 62/401,789, filed September 2016.

  • Aliper et. al., “Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data”, Mol Pharm, 2016 Jul. 5; 13(7): 2524-2530.

  • Mamoshina et. al., “Applications of Deep Learning in Biomedicine”, Mol Pharm, 2016 Mar. 13(5),

  • Ozerov et. al., “In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development”, Nature Communications, 7:13427, 2016.

  • Munoz-Espin, D., & Serrano, M. (2014). Cellular senescence: from physiology to pathology. Nature reviews Molecular cell biology, 15(7), 482-496.

  • Acosta, Juan Carlos, Ana Banito, Torsten Wuestefeld, Athena Georgilis, Peggy Janich, Jennifer P. Morton, Dimitris Athineos, et al. 2013. “A Complex Secretory Program Orchestrated by the Inflammasome Controls Paracrine Senescence.” Nature Cell Biology 15 (8): 978-90.

  • Baar, Marjolein P., Renata M. C. Brandt, Diana A. Putavet, Julian D. D. Klein, Kasper W. J. Derks, Benjamin R. M. Bourgeois, Sarah Stryeck, et al. 2017. “Targeted Apoptosis of Senescent Cells Restores Tissue Homeostasis in Response to Chemotoxicity and Aging.” Cell 169 (1): 132-47.e16.

  • Baker, Darren J., Robbyn L. Weaver, and Jan M. van Deursen. 2013. “p21 Both Attenuates and Drives Senescence and Aging in BubR1 Progeroid Mice.” Cell Reports 3 (4): 1164-74.

  • Campisi, Judith. 2005. “Senescent Cells, Tumor Suppression, and Organismal Aging: Good Citizens, Bad Neighbors.” Cell 120 (4): 513-22.

  • Campisi J. Cellular senescence: putting the paradoxes in perspective. Current opinion in genetics & development. 2011; 21 (1): 107-112. doi:10.1016/j.gde.2010.10.005.

  • Campisi J. Aging, Cellular Senescence, and Cancer. Annual review of physiology. 2013; 75:685-705. doi:10.1146/annurev-physio1-030212-183653.

  • Campisi, Judith, and Fabrizio d'Adda di Fagagna. 2007. “Cellular Senescence: When Bad Things Happen to Good Cells.” Nature Reviews. Molecular Cell Biology 8 (9): 729-40.

  • Chilosi, Marco, Angelo Carloni, Andrea Rossi, and Venerino Poletti. 2013. “Premature Lung Aging and Cellular Senescence in the Pathogenesis of Idiopathic Pulmonary Fibrosis and COPD/emphysema.” Translational Research: The Journal of Laboratory and Clinical Medicine 162 (3): 156-73.

  • Chilosi, Marco, Alberto Zamò, Claudio Doglioni, Daniela Reghellin, Maurizio Lestani, Licia Montagna, Serena Pedron, et al. 2006. “Migratory Marker Expression in Fibroblast Foci of Idiopathic Pulmonary Fibrosis.” Respiratory Research 7 (1). doi: 10.1186/1465-9921-7-95.

  • Coppé, Jean-Philippe, Christopher K. Patil, Francis Rodier, Yu Sun, Denise P. Mũnoz, Joshua Goldstein, Peter S. Nelson, Pierre-Yves Desprez, and Judith Campisi. 2008. “Senescence-Associated Secretory Phenotypes Reveal Cell-Nonautonomous Functions of Oncogenic RAS and the p53 Tumor Suppressor.” PLoS Biology 6 (12): 2853-68.

  • De Cecco M, Criscione S W, Peckham E J, et al. Genomes of replicatively senescent cells undergo global epigenetic changes leading to gene silencing and activation of transposable elements. Aging cell. 2013; 12(2):247-256. doi:10.1111/ace1.12047.

  • Demaria M, Ohtani N, Youssef S A, et al. An Essential Role for Senescent Cells in Optimal Wound Healing through Secretion of PDGF-AA. Developmental cell. 2014; 31(6):722-733. doi:10.1016/j.devce1.2014.11.012.

  • Deursen, Jan M. van. 2014. “The Role of Senescent Cells in Ageing.” Nature 509 (7501): 439-46.

  • DiLoreto, R., and C. T. Murphy. 2015. “The Cell Biology of Aging.” Molecular Biology of the Cell 26 (25): 4524-31.

  • Freund, Adam, Arturo V. Orjalo, Pierre-Yves Desprez, and Judith Campisi. 2010. “Inflammatory Networks during Cellular Senescence: Causes and Consequences.” Trends in Molecular Medicine 16 (5): 238-46.

  • Vestbo, J. et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am. J. Respir. Crit. Care Med. 187, 347-365 (2013).

  • Hernandez Gea, Virginia, and Scott L. Friedman. 2011. “Pathogenesis of Liver Fibrosis.”Annual Review of Pathology: Mechanisms of Disease 6 (1): 425-56.

  • Ivanov, Andre, Jeff Pawlikowski, Indrani Manoharan, John van Tuyn, David M. Nelson, Taranjit Singh Rai, Parisha P. Shah, et al. 2013. “Lysosome-Mediated Processing of Chromatin in Senescence.” The Journal of Cell Biology 202 (1): 129-43.

  • Jun, Joon-Il, and Lester F. Lau. 2010. “The Matricellular Protein CCN1 Induces Fibroblast Senescence and Restricts Fibrosis in Cutaneous Wound Healing.” Nature Cell Biology 12 (7): 676-85.

  • Kim, William Y., and Norman E. Sharpless. 2006. “The Regulation of INK4/ARF in Cancer and Aging.” Cell 127 (2): 265-75.

  • Krimpenfort, Paul, and Anton Berns. 2017. “Rejuvenation by Therapeutic Elimination of Senescent Cells.” Cell 169 (1): 3-5.

  • Krishnamurthy, Janakiraman, Matthew R. Ramsey, Keith L. Ligon, Chad Torrice, Angela Koh, Susan Bonner-Weir, and Norman E. Sharpless. 2006. “p16INK4a Induces an Age-Dependent Decline in Islet Regenerative Potential.” Nature 443 (7110): 453-57.

  • Krizhanovsky, Valery, Monica Yon, Ross A. Dickins, Stephen Hearn, Janelle Simon, Cornelius Miething, Herman Yee, Lars Zender, and Scott W. Lowe. 2008. “Senescence of Activated Stellate Cells Limits Liver Fibrosis.” Cell 134 (4): 657-67.

  • Kuwano, K., R. Kunitake, M. Kawasaki, Y. Nomoto, N. Hagimoto, Y. Nakanishi, and N. Hara. 1996. “P21Waf1/Cip1/Sdi1 and p53 Expression in Association with DNA Strand Breaks in Idiopathic Pulmonary Fibrosis.” American Journal of Respiratory and Critical Care Medicine 154 (2 Pt 1): 477-83.

  • Laberge, Remi-Martin, Pierre Awad, Judith Campisi, and Pierre-Yves Desprez. 2012. “Epithelial-Mesenchymal Transition Induced by Senescent Fibroblasts.” Cancer Microenvironment: Official Journal of the International Cancer Microenvironment Society 5 (1): 39-44.

  • Lomas, Nicola J., Keira L. Watts, Khondoker M. Akram, Nicholas R. Forsyth, and Monica A. Spiteri. 2012. “Idiopathic Pulmonary Fibrosis: Immunohistochemical Analysis Provides Fresh Insights into Lung Tissue Remodelling with Implications for Novel Prognostic Markers.” International Journal of Clinical and Experimental Pathology 5 (1): 58-71.

  • Malavolta, Marco, Elisa Pierpaoli, Robertina Giacconi, Laura Costarelli, Francesco Piacenza, Andrea Basso, Maurizio Cardelli, and Mauro Provinciali. 2016. “Pleiotropic Effects of Tocotrienols and Quercetin on Cellular Senescence: Introducing the Perspective of Senolytic Effects of Phytochemicals.” Current Drug Targets 17 (4): 447-59.

  • Mallette, Frédérick A., and Gerardo Ferbeyre. 2007. “The DNA Damage Signaling Pathway Connects Oncogenic Stress to Cellular Senescence.” Cell Cycle 6 (15): 1831-36.

  • Minagawa, S., J. Araya, T. Numata, S. Nojiri, H. Hara, Y. Yumino, M. Kawaishi, et al. 2010. “Accelerated Epithelial Cell Senescence in IPF and the Inhibitory Role of SIRT6 in TGF-Induced Senescence of Human Bronchial Epithelial Cells.” AJP: Lung Cellular and Molecular Physiology 300 (3): L391-401.

  • Muñoz-Espin, Daniel, Marta Cañamero, Antonio Maraver, Gonzalo Gómez-López, Julio Contreras, Silvia Murillo-Cuesta, Alfonso Rodriguez-Baeza, et al. 2013. “Programmed Cell Senescence during Mammalian Embryonic Development.” Cell 155 (5): 1104-18.

  • Polina Mamoshina, Kirill Kochetov, Evgeny Putin, Franco Cortese, Alexander Aliper, Won-Suk Lee, Sung-M M Ahn, Lee Uhn, Neil Skjodt, Olga Kovalchuk, Morten Scheibye-Knudsen, Alex Zhavoronkov; Population Specific Biomarkers of Human Aging: A Big Data Study Using South Korean, Canadian, and Eastern European Patient Populations, The Journals of Gerontology: Series A, gly005, doi.org/10.1093/gerona/gly005

  • Nelson, Glyn, James Wordsworth, Chunfang Wang, Diana Jurk, Conor Lawless, Carmen Martin-Ruiz, and Thomas von Zglinicki. 2012. “A Senescent Cell Bystander Effect: Senescence-Induced Senescence.” Aging Cell 11 (2): 345-49.

  • Nikolich-Zugich, Janko. 2008. “Ageing and Life-Long Maintenance of T-Cell Subsets in the Face of Latent Persistent Infections.” Nature Reviews. Immunology 8 (7): 512-22.

  • Noble, Paul W., Carlo Albera, Williamson Z. Bradford, Ulrich Costabel, Marilyn K. Glassberg, David Kardatzke, Talmadge E. King Jr, et al. 2011. “Pirfenidone in Patients with Idiopathic Pulmonary Fibrosis (CAPACITY): Two Randomised Trials.” The Lancet 377 (9779): 1760-69.

  • Ohtani, Naoko, Kimi Yamakoshi, Akiko Takahashi, and Eiji Hara. 2004. “The p16INK4a-RB Pathway: Molecular Link between Cellular Senescence and Tumor Suppression.” The Journal of Medical Investigation: JMI 51 (3,4): 146-53.

  • Ozerov, Ivan V., Ksenia V. Lezhnina, Evgeny Izumchenko, Artem V. Artemov, Sergey Medintsev, Quentin Vanhaelen, Alexander Aliper, et al. 2016. “In Silico Pathway Activation Network Decomposition Analysis (iPANDA) as a Method for Biomarker Development.” Nature Communications 7 (November): 13427.

  • Parrinello, Simona, Jean-Philippe Coppe, Ana Krtolica, and Judith Campisi. 2005. “Stromal-Epithelial Interactions in Aging and Cancer: Senescent Fibroblasts Alter Epithelial Cell Differentiation.” Journal of Cell Science 118 (Pt 3): 485-96.

  • Seki, Ekihiro, and David A. Brenner. 2015. “Recent Advancement of Molecular Mechanisms of Liver Fibrosis.” Journal of Hepato-Biliary-Pancreatic Sciences 22 (7): 512-18.

  • Seki, Ekihiro, and Robert F. Schwabe. 2015. “Hepatic Inflammation and Fibrosis: Functional Links and Key Pathways.” Hepatology 61 (3): 1066-79. Storer, Mekayla, Alba Mas, Alexandre Robert-Moreno, Matteo Pecoraro, M. Carmen Ortells, Valeria Di Giacomo, Reut Yosef, et al. 2013. “Senescence Is a Developmental Mechanism That Contributes to Embryonic Growth and Patterning.” Cell 155 (5): 1119-30.

  • Takeuchi, Shinji, Akiko Takahashi, Noriko Motoi, Shin Yoshimoto, Tomoko Tajima, Kimi Yamakoshi, Atsushi Hirao, et al. 2010. “Intrinsic Cooperation between p16INK4a and p21Waf1/Cip1 in the Onset of Cellular Senescence and Tumor Suppression in Vivo.” Cancer Research 70 (22): 9381-90.

  • Wang, Jianrong, Glenn J. Geesman, Sirkka Liisa Hostikka, Michelle Atallah, Benjamin Blackwell, Elbert Lee, Peter J. Cook, et al. 2011. “Inhibition of Activated Pericentromeric SINE/Alu Repeat Transcription in Senescent Human Adult Stem Cells Reinstates Self-Renewal.” Cell Cycle 10 (17): 3016-30.

  • Li, Yifeng, Chih-Yu Chen, and Wyeth W. Wasserman. “Deep feature selection: Theory and application to identify enhancers and promoters.” International Conference on Research in Computational Molecular Biology. Springer International Publishing, 2015.

  • Yacoub, Meziane, and Y. Bennani. “HVS: A heuristic for variable selection in multilayer artificial neural network classifier.” Intelligent Engineering Systems Through Artificial Neural Networks, St. Louis, Mo. Vol. 7. 1997.

  • Dorizzi, B., et al. “Variable selection using generalized RBF networks: Application to the forecast of the French T-bonds.” CESA′96 IMACS Multiconference: computational engineering in systems applications. 1996.

  • Refenes, A. P. N., A. D. Zapranis, and J. Utans. “Neural model identification variable selection and model adequacy.” Decision Technologies for Financial Engineering, Proceedings of NNCM 96. 1998.

  • Ruck, Dennis W., Steven K. Rogers, and Matthew Kabrisky. “Feature selection using a multilayer perceptron.” Journal of Neural Network Computing 2.2 (1990): 40-48.

  • Czernichow, Thomas. “Architecture selection through statistical sensitivity analysis.” International Conference on Artificial Neural Networks. Springer Berlin Heidelberg, 1996.

  • Lehmann, G., Muradian, K. K., & Fraifeld, V. E. (2013). Telomere length and body temperature—independent determinants of mammalian longevity?. Frontiers in genetics, 4.

  • Wolters, S., & Schumacher, B. (2013). Genome maintenance and transcription integrity in aging and disease. Frontiers in genetics, 4.

  • Horvath, S., Zhang, Y., Langfelder, P., Kahn, R. S., Boks, M. P., van Eijk, K., & Ophoff, R. A. (2012). Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol, 13(10), R97.

  • Horvath, S. (2013). DNA methylation age of human tissues and cell types. Genome biology, 14(10), R115.

  • Mendelsohn, A. R., & Larrick, J. W. (2013). The DNA Methylome as a biomarker for epigenetic instability and human aging. Rejuvenation research, 16(1), 74-77.

  • Chowers, I., Liu, D., Farkas, R. H., Gunatilaka, T. L., Hackam, A. S., Bernstein, S. L., . . . & Zack, D. J. (2003). Gene expression variation in the adult human retina. Human molecular genetics, 12(22), 2881-2893.

  • Weindruch, R., Kayo, T., Lee, C. K., & Prolla, T. A. (2002). Gene expression profiling of aging using DNA microarrays. Mechanisms of ageing and development, 123(2), 177-193.

  • Park, S. K., Kim, K., Page, G. P., Allison, D. B., Weindruch, R., & Prolla, T. A. (2009). Gene expression profiling of aging in multiple mouse strains: identification of aging biomarkers and impact of dietary antioxidants. Aging cell, 8(4), 484-495.

  • Zahn, J. M., Poosala, S., Owen, A. B., Ingram, D. K., Lustig, A., Carter, A., & Becker, K. G. (2007). AGEMAP: a gene expression database for aging in mice. PLoS genetics, 3(11), e201.

  • Blalock, E. M., Chen, K. C., Sharrow, K., Herman, J. P., Porter, N. M., Foster, T. C., & Landfield, P. W. (2003). Gene microarrays in hippocampal aging: statistical profiling identifies novel processes correlated with cognitive impairment. The Journal of neuroscience, 23(9), 3807-3819.

  • Welle, S., Brooks, A. I., Delehanty, J. M., Needler, N., & Thornton, C. A. (2003). Gene expression profile of aging in human muscle. Physiological genomics, 14(2), 149-159.

  • Park, S. K., & Prolla, T. A. (2005). Gene expression profiling studies of aging in cardiac and skeletal muscles. Cardiovascular research, 66(2), 205-212.

  • Hong, M. G., Myers, A. J., Magnusson, P. K., & Prince, J. A. (2008). Transcriptome-wide assessment of human brain and lymphocyte senescence. PLoS One, 3(8), e3024.

  • de Magalhaes, J. P., Curado, J., & Church, G. M. (2009). Meta-analysis of age-related gene expression profiles identifies common signatures of aging. Bioinformatics, 25(7), 875-881.

  • Zhavoronkov, A., & Cantor, C. R. (2011). Methods for structuring scientific knowledge from many areas related to aging research. PloS one, 6(7), e22597.

  • Trindade, L. S., Aigaki, T., Peixoto, A. A., Balduino, A., da Cruz, I. B. M., & Heddle, J. G. (2013). A novel classification system for evolutionary aging theories. Frontiers in genetics, 4.

  • Putin, E. et al. (2016) Deep biomarkers of human aging: Application of deep neural networks to biomarker development. Aging 8(5):1021-1033.

  • Lavecchia, A. and Cerchia, C. (2016) In silico methods to address polypharmacology: current status, applications and future perspectives. Drug Discov. Today 21(2):288-298.

  • Oquab, M. et al. (2014) Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition [Internet]. IEEE. 1717-24. doi:10.1109/CVPR.2014.222.

  • Ma, J. et al. (2015) Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships. J Chem Inf Model. 55(2):263-74.

  • Wang, C. et al. (2014) Pairwise Input Neural Network for Target-Ligand Interaction Prediction. Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference. 67-70.

  • Xu, Y. et al. (2015) Deep Learning for Drug-Induced Liver Injury. J. Chem. Inf. Model. 55 (10):2085-2093. doi:10.1021/acs.jcim.5b00238

  • Hughes, T. B. et al. (2015) Modeling Epoxidation of Drug-like Molecules with a Deep Machine Learning Network. ACS Cent Sci. 1(4):168-80. doi:abs/10.1021/acscentsci.5b00131

  • Mayr, A. et al. (2016) DeepTox: Toxicity Prediction using Deep Learning. Frontiers in Environmental Science. doi:10.3389/fenvs.2015.00080

  • Aliper, Alexander, Aleksey V. Belikov, Andrew Garazha, Leslie Jellen, Artem Artemov, Maria Suntsova, Alena Ivanova, et al. 2016. “In Search for Geroprotectors: In Silico Screening and in Vitro Validation of Signalome-Level Mimetics of Young Healthy State.” Aging 8 (9): 2127-52.

  • Aliper, Alexander M., Antonei Benjamin Csoka, Anton Buzdin, Tomasz Jetka, Sergey Roumiantsev, Alexey Moskalev, and Alex Zhavoronkov. 2015. “Signaling Pathway Activation Drift during Aging: Hutchinson-Gilford Progeria Syndrome Fibroblasts Are Comparable to Normal Middle-Age and Old-Age Cells.” Aging 7 (1). Impact Journals, LLC: 26.

  • Ansari, Habib R., Ahmed Nadeem, M. A. Hassan Talukder, Shilpa Sakhalkar, and S. Jamal Mustafa. 2007. “Evidence for the Involvement of Nitric Oxide in A2B Receptor-Mediated Vasorelaxation of Mouse Aorta.” American Journal of Physiology. Heart and Circulatory Physiology 292 (1): H719-25.

  • Astarita, Giuseppe, Kwang-Mook Jung, Vitaly Vasilevko, Nicholas V. Dipatrizio, Sarah K. Martin, David H. Cribbs, Elizabeth Head, Carl W. Cotman, and Daniele Piomelli. 2011. “Elevated Stearoyl-CoA Desaturase in Brains of Patients with Alzheimer's Disease.” PloS One 6 (10): e24777.

  • Campbell L, Saville C R, Murray P J, Cruickshank S M, Hardman M J. Local Arginase 1 Activity Is Required for Cutaneous Wound Healing. The Journal of Investigative Dermatology. 2013; 133(10):2461-2470. doi:10.1038/jid.2013.164.

  • Cole J J, Robertson N A, Rather M I, et al. Diverse interventions that extend mouse lifespan suppress shared age-associated epigenetic changes at critical gene regulatory regions. Genome Biology. 2017; 18:58. doi:10.1186/s13059-017-1185-3.

  • Colegio, Oscar R., Ngoc-Quynh Chu, Alison L. Szabo, Thach Chu, Anne Marie Rhebergen, Vikram Jairam, Nika Cyrus, et al. 2014. “Functional Polarization of Tumour-Associated Macrophages by Tumour-Derived Lactic Acid.” Nature 513 (7519): 559-63.

  • Deignan, Joshua L., Justin C. Livesay, Paul K. Yoo, Stephen I. Goodman, William E. O'Brien, Ramaswamy K. Iyer, Stephen D. Cederbaum, and Wayne W. Grody. 2006. “Ornithine Deficiency in the Arginase Double Knockout Mouse.” Molecular Genetics and Metabolism 89 (1-2): 87-96.

  • Douarre, Celine, Carole Sourbier, Ilaria Dalla Rosa, Benu Brata Das, Christophe E. Redon, Hongliang Zhang, Len Neckers, and Yves Pommier. 2012. “Mitochondrial Topoisomerase I Is Critical for Mitochondrial Integrity and Cellular Energy Metabolism.” PloS One 7 (7). Public Library of Science. doi:10.1371/journal.pone.0041094.

  • Gosule, L. C., and J. A. Schellman. 1976. “Compact Form of DNA Induced by Spermidine.” Nature 259 (5541): 333-35.

  • Khiati, Salim, Simone A. Baechler, Valentina M. Factor, Hongliang Zhang, Shar-Yin N. Huang, Ilaria Dalla Rosa, Carole Sourbier, Leonard Neckers, Snorri S. Thorgeirsson, and Yves Pommier. 2015. “Lack of Mitochondrial Topoisomerase I (TOP1mt) Impairs Liver Regeneration.” Proceedings of the National Academy of Sciences of the United States of America 112 (36): 11282-87.

  • Kunduri, S. S., S. J. Mustafa, D. S. Ponnoth, G. M. Dick, and M. A. Nayeem. 2013. “Adenosine A1 Receptors Link to Smooth Muscle Contraction via CYP4a, PKC-α, and ERK1/2.” Journal of Cardiovascular Pharmacology 62 (1). NIH Public Access: 78.

  • Madauss, Kevin P., William A. Burkhart, Thomas G. Consler, David J. Cowan, William K. Gottschalk, Aaron B. Miller, Steven A. Short, Thuy B. Tran, and Shawn P. Williams. 2009. “The Human ACC2 CT-Domain C-Terminus Is Required for Full Functionality and Has a Novel Twist.” Acta Crystallographica. Section D, Biological Crystallography 65 (5): 449-61.

  • Maesaka, John K., Bali Sodam, Thomas Palaia, Louis Ragolia, Vecihi Batuman, Nobuyuki Miyawaki, Shubha Shastry, Steven Youmans, and Marwan El-Sabban. 2013. “Prostaglandin D2 Synthase: Apoptotic Factor in Alzheimer Plasma, Inducer of Reactive Oxygen Species, Inflammatory Cytokines and Dialysis Dementia.” Journal of Nephropathology 2 (3): 166-80.

  • Magalhaes, João Pedro de, João Curado, and George M. Church. 2009. “Meta-Analysis of Age-Related Gene Expression Profiles Identifies Common Signatures of Aging.” Bioinformnatics 25 (7): 875-81.

  • Mak, Isabella Wy, Nathan Evaniew, and Michelle Ghert. 2014. “Lost in Translation: Animal Models and Clinical Trials in Cancer Treatment.” American Journal of Translational Research 6 (2): 114-18.

  • Ma, Yina, and Ji Li. 2015. “Metabolic Shifts during Aging and Pathology.” Comprehensive Physiology 5 (2): 667-86.

  • McKinnon, Peter J. 2016. “Topoisomerases and the Regulation of Neural Function.” Nature Reviews. Neuroscience 17 (11): 673-79.

  • Moskalev A, Et al. 2017. “Geroprotectors.org: A New, Structured and Curated Database of Current Therapeutic Interventions in Aging and Age-Related Disease. —PubMed—NCBI.” Accessed March 17. ncbi.nlm.nih.gov/pubmed/26342919.

  • Nozaki, Hiroaki, Taisuke Kato, Megumi Nihonmatsu, Yohei Saito, Ikuko Mizuta, Tomoko Noda, Ryoko Koike, et al. 2016. “Distinct Molecular Mechanisms of HTRA1 Mutants in Manifesting Heterozygotes with CARASIL.” Neurology 86 (21): 1964-74.

  • Ogneva, Irina V., Nikolay S. Biryukov, Toomas A. Leinsoo, and Irina M. Larina. 2014. “Possible Role of Non-Muscle Alpha-Actinins in Muscle Cell Mechanosensitivity.” PloS One 9 (4). Public Library of Science: e96395.

  • Petkovich D A, Podolskiy D I, Lobanov A V, Lee S-G, Miller R A, Gladyshev V N. Using DNA methylation profiling to evaluate biological age and longevity interventions. Cell metabolism. 2017; 25(4):954-960.e6. doi:10.1016/j.cmet.2017.03.016.

  • Phillips, Catherine M., Louisa Goumidi, Sandrine Bertrais, Martyn R. Field, L. Adrienne Cupples, Jose M. Ordovas, Jolene McMonagle, et al. 2010. “ACC2 Gene Polymorphisms, Metabolic Syndrome, and Gene-Nutrient Interactions with Dietary Fat.” Journal of Lipid Research 51 (12): 3500-3507.

  • Pinto, Elisabete. 2007. “Blood Pressure and Ageing.” Postgraduate Medical Journal 83 (976). BMJ Group: 109.

  • Pledgie, Allison, Yi Huang, Amy Hacker, Zhe Zhang, Patrick M. Woster, Nancy E. Davidson, and Robert A. Casero Jr. 2005. “Spermine Oxidase SMO(PAOh1), Not N1-Acetylpolyamine Oxidase PAO, Is the Primary Source of Cytotoxic H2O2 in Polyamine Analogue-Treated Human Breast Cancer Cell Lines.” The Journal of Biological Chemistry 280 (48): 39843-51.

  • Qian, Hao, Na Luo, and Yuling Chi. 2012. “Aging-Shifted Prostaglandin Profile in Endothelium as a Factor in Cardiovascular Disorders.” Journal of Aging Research 2012 (February). Hindawi Publishing Corporation. doi:10.1155/2012/121390.

  • Savolainen, Kalle, Tiina J. Kotti, Werner Schmitz, Teuvo I. Savolainen, Raija T. Sormunen, Mika Ilves, Seppo J. Vainio, Ernst Conzelmann, and J. Kalervo Hiltunen. 2004. “A Mouse Model for Alpha-Methylacyl-CoA Racemase Deficiency: Adjustment of Bile Acid Synthesis and Intolerance to Dietary Methyl-Branched Lipids.” Human Molecular Genetics 13 (9): 955-65.

  • Selkälä, Eija M., Remya R. Nair, Werner Schmitz, Ari-Pekka Kvist, Myriam Baes, J. Kalervo Hiltunen, and Kaija J. Autio. 2015. “Phytol Is Lethal for Amacr-Deficient Mice.” Biochimica et Biophysica Acta 1851 (10): 1394-1405.

  • Sergio Solórzano-Vargas, R., Diana Pacheco-Alvarez, and Alfonso León-Del-Rio. 2002. “Holocarboxylase Synthetase Is an Obligate Participant in Biotin-Mediated Regulation of Its Own Expression and of Biotin-Dependent Carboxylases mRNA Levels in Human Cells.” Proceedings of the National Academy of Sciences of the United States of America 99 (8). National Academy of Sciences: 5325-30.

  • Suzuki, Yoichi, Xue Yang, Yoko Aoki, Shigeo Kure, and Yoichi Matsubara. 2005. “Mutations in the Holocarboxylase Synthetase Gene HLCS.” Human Mutation 26 (4): 285-90.

  • Tang, Eva H. C., and Paul M. Vanhoutte. 2008. “Gene Expression Changes of Prostanoid Synthases in Endothelial Cells and Prostanoid Receptors in Vascular Smooth Muscle Cells Caused by Aging and Hypertension.” Physiological Genomics 32 (3): 409-18.

  • Thomas, Inas, and Brigid Gregg. 2017. “Metformin; a Review of Its History and Future: From Lilac to Longevity.” Pediatric Diabetes 18 (1): 10-16.

  • Thomas, T., and T. J. Thomas. 2017. “Polyamine Metabolism and Cancer. —PubMed—NCBI.” Accessed April 11. ncbi.nlm.nih.gov/pubmed/12927050.

  • Tong, Liang. 2013. “Structure and Function of Biotin-Dependent Carboxylases.” Cellular and Molecular Life Sciences: CMLS 70 (5). NIH Public Access: 863.

  • Unno, Keiko, Tomokazu Konishi, Aimi Nakagawa, Yoshie Narita, Fumiyo Takabayashi, Hitomi Okamura, Ayane Hara, et al. 2015. “Cognitive Dysfunction and Amyloid β Accumulation Are Ameliorated by the Ingestion of Green Soybean Extract in Aged Mice.” Journal of Functional Foods 14: 345-53.

  • Verdura E, Et al. 2017. “Heterozygous HTRA1 Mutations Are Associated with Autosomal Dominant Cerebral Small Vessel Disease. —PubMed—NCBI.” Accessed April 11. ncbi.nlm.nih.gov/pubmed/26063658.

  • Weller J, Et al. 2017. “Age-Related Decrease of Adenosine-Mediated Relaxation in Rat Detrusor Is a Result of A2B Receptor Downregulation. —PubMed—NCBI.” Accessed April 17. ncbi.nlm.nih.gov/pubmed/25728851.

  • Zhang, Yongyou, Amar Desai, Sung Yeun Yang, Ki Beom Bae, Monika I. Antczak, Stephen P. Fink, Shruti Tiwari, et al. 2015. “TISSUE REGENERATION. Inhibition of the Prostaglandin-Degrading Enzyme 15-PGDH Potentiates Tissue Regeneration.” Science 348 (6240): aaa2340.

  • Seim, Inge, Siming Ma, and Vadim N. Gladyshev. 2016. “Gene Expression Signatures of Human Cell and Tissue Longevity.” Npj Aging and Mechanisms of Disease 2 (1). doi:10.1038/npjamd.2016.14.


Claims
  • 1. A method of creating a biological aging clock for a subject, the method comprising: (a) receiving a DNA methylation data signature derived from a biological sample of the subject, wherein the DNA methylation data signatures includes a plurality of DNA methylation sites;(b) creating input vectors based on the DNA methylation data signature;(c) inputting the input vectors into a machine learning platform;(d) generating a predicted biological aging clock of the subject based on the input vectors by the machine learning platform, wherein the biological aging clock is specific to the subject; and(e) preparing a report that includes the biological aging clock that identifies a predicted biological age of the subject.
  • 2. The method of claim 1, further comprising: creating at least a second biological aging clock by repeating any one or more of steps (a), (b), (c), and/or (d), wherein the second biological aging clock is based on a second DNA methylation data signature from the biological sample of the subject, a different biological sample of the subject, or a biological sample of a second subject; andpreparing a report that includes the second biological aging clock that identifies a second predicted biological age of the subject, a different biological sample of the subject, or a biological sample of a second subject.
  • 3. The method of claim 2, further comprising: combining the biological aging clock with the second biological aging clock to create a synthetic biological aging clock, wherein the synthetic biological aging clock provides a synthetic biological age of the subject; andoptionally, preparing a report that includes the synthetic biological aging clock that identifies the synthetic biological age of the subject.
  • 4. The method of claim 3, further comprising one or more of: comparing the predicted biological age of the subject with the actual age of the subject;comparing the second predicted biological age of the subject with the actual age of the subject; orcomparing the synthetic biological age of the subject and with the actual age of the subject,wherein the method further comprises:preparing a report with the results of the comparing of the synthetic biological age with the actual age and with a difference of the synthetic biological age from the actual age of the subject.
  • 5. The method of claim 1, wherein the report includes one or more of: a therapeutic regimen based on the predicted biological age in view of an actual age of the subject;a diet regimen based on the predicted biological age in view of an actual age of the subject;a questionnaire about lifestyle habits;a prognosis of the life expectancy with and/or without the therapeutic regimen;a prognosis of the life expectancy with and/or without the diet regimen;a prognosis of the probability of survival of patient during the therapeutic regimen;a prognosis of the probability of survival of patient during the diet regimen;a prognosis of developing disease complications or therapy side effects;a prognosis of the severity degree of diseases including infectious diseases such severe acute respiratory syndrome, coronavirus disease 2019 and others;an identification of disease stages including infectious diseases and others; ora prognosis of physical fitness of the patient.
  • 6. The method of claim 1, wherein the biological sample is from a cell, fluid, tissue, or organ that is are: diseased;healthy;determined as susceptible to disease;undergoing senescence;in pre-senescence; ornon-senescent.
  • 7. The method of claim 5, wherein the therapeutic regimen includes one or more of: applying a senoremediation drug treatment protocol to the subject in order to rescue one or more first cells in the subject;applying a senolytic drug treatment protocol to the subject in order to remove one or more second cells in the subject;introducing stem cells into a tissue and/or organ of the subject in order to rejuvenate one or more tissue cells in the tissue and/or one or more organ cells in the organ;carrying out a reinforcement step that includes one or more actions that prevent further senescence or degradation of the tissue or organ; orone or more actions that prevent further senescence or degradation of the tissue or organ is derived from the computational proteome analysis of the tissue or organ of the subject.
  • 8. The method of claim 1, further comprising: correlating a methylomics profile of the DNA methylation data signature with the predicted biological age of the subject.
  • 9. The method of claim 1, further comprising: obtaining the biological sample from the subject; andobtaining the DNA methylation data signature by performing a measurement of the methylomics of DNA in the biological sample.
  • 10. The method of claim 1, wherein the biological aging clock can estimate human age with a MedAE of 2.77 years, or +/−10%.
  • 11. The method of claim 1, further comprising: performing feature importance analysis for ranking DNA methylation sites by their importance in age prediction by using the biological data; andcorrelating a biological signaling pathway signature with the predicted biological age of the subject.
  • 12. The method of claim 11, wherein machine learning platform includes feed-forward neural networks with more than three hidden layers.
  • 13. The method of claim 1, wherein the method is performed with a neural network configured for performing an epigenetic analysis with feature selection based on a feature importance analysis.
  • 14. The method of claim 13, wherein the method is performed with a model that is trained on DNA methylation profiles from a plurality of subjects.
  • 15. The method of claim 14, wherein the method is performed with a model that is verified by being processed with healthy subjects.
  • 16. The method of claim 1, comprising: inputting DNA methylation vectors of the subject into deep neural network model having multiple hidden layers;performing regression calculation;obtaining an age prediction of the subject; andproviding the age prediction to the subject.
  • 17. The method of claim 16, comprising: training the deep neural network model on the DNA methylation data of the DNA methylation vectors;performing a deep feature selection protocol;performing a gradient-based feature selection protocol; andidentifying important features having an importance value over an importance threshold.
  • 18. The method claim 17, comprising: optimizing model parameters;performing a grid search over model depth of layers;performing an activation function protocol;performing an optimizing algorithm protocol; andperforming a regularization algorithm protocol.
  • 19. The method of claim 18, comprising: selecting at last one best feature selection protocol; andfixing a set of identified important features.
  • 20. The method of claim 1, wherein the machine learning platform includes a deep neural network trained on DNA methylation data, the method comprising: training a first deep neural network with DNA methylation data from a training set;selecting a number of top features;reducing the number of features to the number of top features;training a second deep neural network with the number of top features; andobtaining trained second deep neural network as the machine learning platform configured for providing the biological aging clock.
  • 21. The method of claim 1, comprising: obtaining DNA methylation data;adding 0.5 years pseudocount to whole age years of subjects to obtain updated DNA methylation data;preparing a training data set and verification data set from updated DNA methylation data;train a first deep neural network with training data set;performing deep feature selection protocol;selecting top ranked important features;training second deep neural network with important features;verifying the second deep neural network with the verification data set; andproviding the verified second deep neural network as the machine learning platform.
  • 22. The method of claim 1, after a defined time period, performing steps (a), (b), (c), (d), and (e) in a second iteration; andcomparing the initial report with the report of the second iteration; anddetermining a change in the predicted biological age over the defined time period.
  • 23. The method of claim 1, further comprising: performing a therapeutic regimen over a defined time period,performing steps (a), (b), (c), (d), and (e) in a second iteration; andcomparing the initial report with the report of the second iteration;determining a change in the predicted biological age over the defined time period; anddetermining: whether the therapeutic regimen changed the predicted biological age,if the therapeutic regimen changed the predicted biological age, then determine whether or not to: continue therapeutic regimen, change therapeutic regimen, or stop therapeutic regimen, orif the therapeutic regimen does not change the predicted biological age, then determine whether or not to: continue therapeutic regimen, change therapeutic regimen, or stop therapeutic regimen.
  • 24. A computer program product comprising a tangible, non-transitory computer readable medium having a computer readable program code stored thereon, the code being executable by a processor to perform a method for creating a biological aging clock for a patient, the method comprising: (a) receiving a DNA methylation data signature derived from a biological sample of the subject, wherein the DNA methylation data signatures includes a plurality of DNA methylation sites;(b) creating input vectors based on the DNA methylation data signature;(c) inputting the input vectors into a machine learning platform;(d) generating a predicted biological aging clock of the subject based on the input vectors by the machine learning platform, wherein the biological aging clock is specific to the subject; and(e) preparing a report that includes the biological aging clock that identifies a predicted biological age of the subject.
  • 25. The computer program product of claim 24, further comprising: correlating a methylomics profile of the DNA methylation data signature with the predicted biological age of the subject.
  • 26. The computer program product of claim 24, wherein the method is performed with a neural network configured for performing an epigenetic analysis with feature selection based on a feature importance analysis.
  • 27. The computer program product of claim 26, wherein method is performed with a model that is trained on DNA methylation profiles from a plurality of subjects.
  • 28. The computer program product of claim 27, wherein the method is performed with a model that is verified by being processed with healthy subjects.
  • 29. The computer program product of claim 24, comprising: inputting DNA methylation vectors of the subject into deep neural network model having multiple hidden layers;performing regression calculation;obtaining an age prediction of the subject; andproviding the age prediction to the subject.
  • 30. The computer program product of claim 29, comprising: training the deep neural network model on the DNA methylation data of the DNA methylation vectors;performing a deep feature selection protocol;performing a gradient-based feature selection protocol; andidentify important features having an importance value over an importance threshold.
  • 31. The computer program product of claim 30, comprising: optimizing model parameters;performing a grid search over model depth of layers;performing an activation function protocol;performing an optimizing algorithm protocol; andperforming a regularization algorithm protocol.
  • 32. The computer program product of claim 31, comprising: selecting at last one best feature selection protocol; andfixing a set of identified important features.
  • 33. The computer program product 24, wherein the machine learning platform includes a deep neural network trained on DNA methylation data, the method comprising: training a first deep neural network with DNA methylation data from a training set;selecting a number of top features;reducing the number of features to the number of top features;training a second deep neural network with the number of top features; andobtaining trained second deep neural network as the machine learning platform configured for providing the biological aging clock.
  • 34. The computer program product 24, comprising: obtaining DNA methylation data;adding 0.5 years pseudocount to whole age years of subjects to obtain updated DNA methylation data;preparing a training data set and verification data set from updated DNA methylation data;train a first deep neural network with training data set;performing deep feature selection protocol;selecting top ranked important features;training second deep neural network with important features;verifying the second deep neural network with the verification data set; andproviding the verified second deep neural network as the machine learning platform.
CROSS-REFERENCE

This patent application claims priority to U.S. Provisional Application No. 63/081,297 filed Sep. 21, 2021; and this patent application is a continuation-in-part of U.S. application Ser. No. 16/883,205 filed May 26, 2020, which is a continuation-in-part of U.S. application Ser. No. 16/415,855 filed May 17, 2019 (now U.S. Pat. No. 10,665,326), which is a continuation-in-part of U.S. application Ser. No. 16/104,391 filed Aug. 17, 2018 (now U.S. Pat. No. 10,325,673), which is a continuation-in-part of U.S. application Ser. No. 16/044,784 filed Jul. 25, 2018, which claims priority to U.S. Provisional Application No. 62/536,658 filed Jul. 25, 2017 and claims priority to U.S. Provisional Application No. 62/547,061 filed Aug. 17, 2017; which applications are incorporated herein by specific reference in their entirety.

Provisional Applications (3)
Number Date Country
63081297 Sep 2020 US
62536658 Jul 2017 US
62547061 Aug 2017 US
Continuation in Parts (4)
Number Date Country
Parent 16883205 May 2020 US
Child 17479892 US
Parent 16415855 May 2019 US
Child 16883205 US
Parent 16104391 Aug 2018 US
Child 16415855 US
Parent 16044784 Jul 2018 US
Child 16104391 US