METHOD AND SYSTEM FOR PREDICTING BIOLOGICAL AGE ON BASIS OF VARIOUS OMICS DATA ANALYSES

Description

TECHNICAL FIELD

The present invention relates to a method and system for predicting age based on analysis of various omics data, and more specifically, to a method for predicting aging or biological age using integrated information such as DNA methylation, mRNA expression level and telomere length, which acquires and comprehensively analyzes various omics data related to telomere length, DNA methylation, mRNA expression level, etc. from the specimen sample (e.g., blood) from a subject to predict the biological age of the subject and classify and analyze the degree of aging based on each omics data and a system for performing the same.

DISCUSSION OF RELATED ART

Biological age refers to the age quantified by comprehensively evaluating the overall health status and the degree of aging. In predicting such aging/biological age, the method using telomere length has been generally used. Biomarkers and combinations thereof are being developed to predict age based on DNA methylation or gene expression levels that change significantly with age.

SUMMARY

The issue to be addressed by the present invention is to provide a method and system for predicting biological age based on various omics data analysis that can solve the problems of the prior art.

A system for predicting biological age based on various omics data analysis according to an embodiment of the present invention for addressing the above issues comprises: a test sample collection unit for collecting a plurality of genetic test samples, including at least one of DNA and RNA of a subject; a test sample analysis unit for analyzing a plurality of types of omics data from each of the plurality of genetic test samples; a preprocessing unit for preprocessing the omics data analyzed through the test sample analysis unit; an association analysis unit for performing an association analysis based on the omics type of data for each omics area converted through the preprocessing unit; and an age prediction unit for predicting the subject's age based on the analyzed result of the association analysis unit and the data for each omics area.

A method for predicting biological age based on various omics data analysis according to an embodiment of the present invention for addressing the above issues comprises steps of collecting a plurality of genetic test samples, including at least one of DNA and RNA of a subject in a test sample collection unit; analyzing a plurality of types of omics data from each of the plurality of genetic test samples in a test sample analysis unit; preprocessing the omics data analyzed through the test sample analysis unit in a preprocessing unit; performing an association analysis based on each omics type of data for each omics area converted through the preprocessing unit in an association analysis unit; predicting the age of a subject based on the analysis result of the association analysis unit and the data for each omics area in the age prediction unit.

The method and system for predicting biological age based on various omics data analysis according to an embodiment of the present invention a reused to combine and reflect markers of various omics regions in the biological age prediction model, thereby having the advantage of being able to offset the existing error in individual omics area. It allows more accurate age prediction and distinguishing and interpreting the influence (or the degree of aging) of each omics area on the integratedly predicted biological age (the current degree of aging of the subject).

That is, through the combination of three omics data, such as the genome (telomere length), exogenous (methylation), and transcript (gene expression) of samples such as human blood: 1) the age prediction accuracy can be increased by offsetting the noise; 2) The biological age (degree of aging) of the subject can be analyzed by dividing it by omics area.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system for predicting biological age based on analysis of various omics data according to an embodiment of the present invention.

FIGS. 2a to 2d are flowcharts illustrating a method for predicting biological age based on analysis of various omics data.

FIG. 3 is a graph showing the correlation between the biological age and the actual age based on linear regression using the telomere length.

FIG. 4 is a graph showing the correlation between biological age and actual age based on linear regression using sixteen methylation markers.

FIG. 5 is a graph showing the correlation between the biological age and the actual age based on linear regression using eighteen gene expression markers.

FIG. 6 is a graph showing the correlation between the omics-integrated biological age and actual age based on linear regression combining telomere length, methylation marker, and gene expression marker presented in the present invention.

FIG. 7 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting coefficient of determination) for each omics area.

FIG. 8 is a graph showing the correlation between the biological age and the actual age based on linear regression using four methylation markers.

FIG. 9 is a graph showing the correlation between living age and actual age based on linear regression using four gene expression markers.

FIG. 10 is a graph showing the correlation between the omics-integrated biological age and actual age based on linear regression combining telomere length, methylation, and gene expression markers presented in the present invention.

FIG. 11 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting coefficient of determination) for each omics area.

FIG. 12 is a graph showing the correlation between biological age and actual age based on an artificial neural network using telomere length.

FIG. 13 is a graph showing the correlation between biological age and actual age based on an artificial neural network using sixteen methylation markers.

FIG. 14 is a graph showing the correlation between biological age and actual age based on an artificial neural network using eighteen gene expression markers.

FIG. 15 is a graph showing the correlation between the biological age and the actual age of the artificial neural network-based omics integration combining the telomere length, methylation marker, and gene expression marker presented in the present invention.

FIG. 16 is a graph showing the correlation between an omics-integrated biological age and the actual age obtained by summing the telomere-based age, methylation-based age, and gene expression-based age using the artificial neural network presented in the present invention by weights (weighting coefficient of determination) for each omics area.

FIG. 17 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting significance) for each omics area.

FIG. 18 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting mean error) for each omics area.

DETAILED DESCRIPTION

Hereinafter, embodiments of the present invention are described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily carry out the present invention. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. Further, in order to clearly explain the present invention in the drawings, parts irrelevant to the description are excluded, and similar reference numerals are assigned to similar parts throughout the specification.

Throughout the specification, when a part is “connected” with another part, it is not only “directly connected” but also “electrically connected” with another element interposed therebetween. Further, when a part “includes” a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and it is to be understood that the existence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

The terms “about,” “substantially,” etc. related to the extent used throughout the specification are used in a sense at or close to the numerical value when the manufacturing and material tolerances inherent in the stated meaning are presented and are used to prevent an unscrupulous infringer from using the disclosure in which exact or absolute values are mentioned to aid the understanding of the present invention. As used throughout the specification of the present invention, the term “step of (to)” or “step of” does not mean “step for.”

In this specification, a “unit” includes a unit implemented by hardware, a unit implemented by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented with one hardware.

In this specification, some of the operations or functions described as being performed by the terminal, apparatus, or device may be performed instead of in a server connected to the terminal, apparatus, or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal, apparatus, or device connected to the server.

In this specification, some of the operations or functions described as mapping or matching with the terminal may be interpreted to mean mapping or matching the terminal's unique number or personal identification information, which is identifying data of the terminal.

Hereinafter, the present invention is described in detail with reference to the accompanying drawings.

FIG. 1 is a block diagram of a system for predicting biological age based on analysis of various omics data according to an embodiment of the present invention. FIGS. 2a to 2d are flowcharts illustrating a method for predicting biological age based on analysis of various omics data. FIG. 3 is a graph showing the correlation between the biological age and the actual age based on linear regression using the telomere length. FIG. 4 is a graph showing the correlation between biological age and actual age based on linear regression using sixteen methylation markers. FIG. 5 is a graph showing the correlation between the biological age and the actual age based on linear regression using eighteen gene expression markers. FIG. 6 is a graph showing the correlation between the omics-integrated biological age and actual age based on linear regression combining telomere length, methylation marker, and gene expression marker presented in the present invention. FIG. 7 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting coefficient of determination) for each omics area. FIG. 8 is a graph showing the correlation between the biological age and the actual age based on linear regression using four methylation markers. FIG. 9 is a graph showing the correlation between living age and actual age based on linear regression using four gene expression markers. FIG. 10 is a graph showing the correlation between the omics-integrated biological age and actual age based on linear regression combining telomere length, methylation, and gene expression markers presented in the present invention. FIG. 11 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting coefficient of determination) for each omics area. FIG. 12 is a graph showing the correlation between biological age and actual age based on an artificial neural network using telomere length. FIG. 13 is a graph showing the correlation between biological age and actual age based on an artificial neural network using sixteen methylation markers. FIG. 14 is a graph showing the correlation between biological age and actual age based on an artificial neural network using eighteen gene expression markers. FIG. 15 is a graph showing the correlation between the biological age and the actual age of the artificial neural network-based omics integration combining the telomere length, methylation marker, and gene expression marker presented in the present invention. FIG. 16 is a graph showing the correlation between an omics-integrated biological age and the actual age obtained by summing the telomere-based age, methylation-based age, and gene expression-based age using the artificial neural network presented in the present invention by weights (weighting coefficient of determination) for each omics area. FIG. 17 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting significance) for each omics area. FIG. 18 is a graph showing the correlation between omics-integrated biological age and actual age by summing the telomere-based age, methylation-based age, and gene expression-based age presented in the present invention by weights (weighting mean error) for each omics area.

First, as shown in FIG. 1, the system for predicting biological age 100 based on the analysis of various omics data according to an embodiment of the present invention acquires various omics information (e.g., telomere length, DNA methylation level, gene expression level, etc.) from test samples (e.g., blood) and integratedly analyze each acquired omics information (e.g., multiple linear regression analysis, weighting for each omics region) to measure (predict) biological age more accurately than before.

More specifically, the system for predicting biological age 100 based on the analysis of various omics data of the present invention is a test sample collection unit 110, a test sample analysis unit 120, a preprocessing unit 130, an association analysis unit 140, a weight allocation unit 150, a weight correction unit 160, and an age prediction unit 170.

The test sample collection unit 110 includes a configuration for collecting a plurality of genetic test samples containing the DNA and RNA of the subject, that is, a configuration for collecting and then classifying a plurality of aging biomarker test samples.

Here, the aging biomarker may be information on measuring telomere length by collecting DNA from blood samples of various age groups, performing DMR analysis through methyl-seq or chip experiment, and performing DEG analysis through RNA-seq or microarray experiments on collected RNA and the like.

Further, each aging biomarker test sample is classified into learning data and test data, and the classified learning data and test data are used for age prediction of each omics area marker, integrated omics analysis, and predicted age weight summation analysis.

Next, the test sample analysis unit 120 is configured to analyze a plurality of types of omics data from each of the plurality of genetic test samples. That is, it may be a configuration for analyzing an omics area, including telomere length information, DNA methylation information, and gene expression level for each gene from the plurality of genetic test samples.

Specifically, the test sample analysis unit 120 comprises a telomere length measurement unit, a methylation marker analysis and filtering unit, and a gene expression marker analysis and filtering unit.

The telomere length measurement unit is configured to measure the relative length of telomeres compared to a single copy gene using the qPCR, TRF, or Q-FISH method, in which the fluorescence detection limit cycle number (Ct) for each concentration is measured from a standard oligomer sample of known length. Then the total telomere length is obtained by dividing the Ct value of the telomeres by the Ct value of the reference gene, and the absolute length of the telomeres is measured by dividing this by the number of telomeres in the human genome.

For reference, the above-described method for measuring telomere length is only an example, and various conventional methods for measuring telomere length may be applied.

Next, the methylation marker analysis and filtering unit map the methylation raw data obtained using DMR analysis, etc. through experiments such as Methyl-seq, chip, etc. to a human genome map (human reference), thereby obtaining the methylation degree (hereinafter, “beta value”) by location of each test sample and selects areas in which beta values increase or decrease according to age in each test sample using DMR analysis.

Next, the gene expression marker analysis and filtering unit map the gene expression raw data obtained through experiments such as RNA-seq and microarray to the human genome map (human reference) to calculate the expression level for each gene in each test sample, remove the batch effect according to gender/lifestyle, etc. from the calculated gene expression level, and then select genes whose expression level increases or decreases according to age in each test sample using DEG analysis and the like.

Next, the preprocessing unit 130 is configured to perform preprocessing on the omics data analyzed through the test sample analysis unit 120.

More specifically, the preprocessing unit 130 converts beta values and expression level values of selected methylation markers and gene expression markers, and telomere length into percentiles in the range of 0 to 1 for the application of multiple linear regression analysis or artificial neural network-based regression analysis.

Next, the association analysis unit 140 performs an association analysis based on each omics type of data for each omics area converted through the preprocessing unit 130. More specifically, the association analysis unit 140 uses multiple linear regression analysis or artificial neural network-based regression analysis to calculate each coefficient value of the independent variable in a regression model with the preprocessed value of the biomarker for each omics area converted as an independent variable and biological age as a dependent variable. Through the calculated coefficients, the association between the biological age and the actual age for each area predicted from the preprocessed value of each omics area biomarker is analyzed, and the analyzed association may be one of the coefficients of determination (R_x²) significance (PVAL_x), and mean absolute error (MAE_x).

Next, the weight allocation unit 150 may be configured to assign a weight to each type of omics data based on any one of the associations (coefficient of determination, significance, and mean absolute error) analyzed through the association analysis unit.

The weight allocation unit 150 calculates the weight (W_x) for the coefficient of determination (R_x²), significance (PVAL_x), and mean absolute error (MAE_x) of each omics area using the following equations.

W
_x=1/(1−R_x²) (Weight equation for coefficient of determination)

W
_x=log(PVAL_x)*(−1) (Weight equation for significance)

W
_x=1/mae_x,rev (Weight equation for mean absolute error)

The weight correction unit 160 may be configured to obtain a correction weight (W_x,rev) by exponentiating a weighted average value (W_avg) for each area of the weights given to each type of omics data using the following equations.

W
_x,rev
=W
_avg
^(W
^x
^/Wavg) (Weight correction equation)

Meanwhile, the weight correction unit 160 may obtain distribution correction (mae_x,rev) by the average age (AGE_avg) of the sample group to relatively reflect the mean absolute error compared to the actual age distribution before weight correction for the mean absolute error (MAE_x) of each omics area through the following equation.

mae
_x,rev
=MAE
_x/AGE_avg

Next, the age prediction unit 170 is configured to predict the subject's age based on the analysis result of the association analysis unit and the data for each omics area and may predict the subject's age through the following equation.

$\begin{matrix} {AGE}_{integ} = \frac{\sum ({AGE}_{x} * w_{x, rev})}{\sum w_{x, rev}} & (Biological age {AGE}_{integ} prediction equation) \end{matrix}$

That is, the age prediction unit 170 is configured to calculate the weights of each omics area using any one of the coefficients of determination, significance, and mean absolute error for the age of individual omics data and then predict biological age or aging state by comparing the sum of the age inferred from the individual omics according to the weight.

Hereinafter, with reference to the drawings, the first comparative example compares the predictive power of the linear regression-based biological age to the actual age using telomere length, sixteen methylation markers, and eighteen gene expression markers through the configurations disclosed herein are briefly described.

1) Omics Integrated Multiple Linear Regression Analysis

In the first comparative example, the association analysis unit 140 of the present application performs multiple linear regression analysis and omics integration analysis for each area using sixteen methylation markers based on preselected adjusted p-value<1.0e-30 and eighteen gene expression markers based on adj.pval<5.0e-02 along with the telomere length.

2) Summation Analysis of Biological Age Weights by Omics Area (Weighted Coefficient of Determination)

The association analysis unit 140 of the present application obtains the coefficient of determination (R_x²) for the actual age of the sample of the biological age predicted for each area from multiple linear regression analysis using the markers of each omics area.

The weight allocation unit 150 of the present application calculates the weight (W_x) of each omics area as in Equation 1 in order to give greater weight to the omics region having a large coefficient of determination.

W
_x=1/(1−R_x²) [Equation 1]

Further, when the difference in the coefficient of determination for the actual age of the biological age between each omics area is large, the weight correction unit 160 of the present application calculates a corrected weight value (W_x,rev) through exponentiation of the average weight value (W_avg) of each omics area as shown in Equation 2 in order to emphasize and reflect the age of the omics area with high reliability in the weight (W_x) for each area.

The age prediction unit 170 of the present application calculates an omics-integrated biological age (AGE_integ) by applying and summing a weight for each omics area to the predicted age (AGE_x) for each area, as shown in Equation 3, and summing them.

$\begin{matrix} W_{x, rev} = {W_{avg}}^{(W_{x} / Wavg)} & [Equation 2] \end{matrix}$

$\begin{matrix} {AGE}_{integ} = \frac{\sum ({AGE}_{x} * w_{x, rev})}{\sum w_{x, rev}} & [Equation 3] \end{matrix}$

Table 1 below shows the coefficient of determination, weight, and correction weight of each omics area and Table 2 compares the predicted value of omics-integrated biological age by summing the age for each omics area and weights for each omics area and the actual age.

TABLE 1

Telomeres
Methylation
Gene expression

Coefficient of
0.317
0.930
0.834

determination (R_x²)

Weight W_x
1.46
14.25
6.02

Corrected weight
1.49
49.19
5.19

W_x,rev

TABLE 2

Actual

AGE_telo
W_telo
AGE_meth
W_meth
AGE_exp
W_exp
AGE_integ
age

38.63
1.49
21.31
49.19
26.46
5.19
22.25
22

34.38

46.24

47.15

46.00
44

57.15

66.77

67.77

66.61
74

Table 3 compares the omics integrated regression analysis of biological age and age-weighted summation of omics integrated biological age prediction results for each omics area compared to individual omics biological age prediction.

TABLE 3

Gene
integrated
Result of

Telomere-
Methylation-
expression-
omics-based
age-weighted

based age
based age
based age
on age
summation

prediction
prediction
prediction
prediction
by omics

Coefficient of
0.317
0.930
0.834
0.979
0.936

determination

(R²)

Significance
1.7E−05
6.4E−30
9.9E−21
6.5E−43
7.2E−31

(P-val)

Mean
10.593
3.003
5.094
1.773
2.934

absolute error

(MAE)

Referring to Table 3, it can be shown that the omics integrated biological age prediction by omics integrated regression analysis or age weight summation for each omics area is closer to the actual age of the sample in terms of coefficient of determination and significance, and the mean error (MAE) is smaller compared to the age-predicted through multiple linear regression analysis from individual omics.

Hereinafter, with reference to the drawings, the second comparative example comparing the predictive power of the linear regression-based biological age to the actual age using the telomere length, four methylation markers, and four gene expression markers through the configurations disclosed herein is briefly described.

1) Omics Integrated Multiple Linear Regression Analysis

In the second comparative example, the association analysis unit 140 of the present application performs multiple linear regression analysis and omics integration analysis for each area by selecting four methylation markers based on adj.Pval<1.0e-30 and the absolute value of the association between the marker and the actual age |R|>0.75 and four gene expression markers based on adj.Pval<1.0e-04 along with the telomere length.

2) Summation Analysis of Biological Age Weights by Omics Area (Weighted Coefficient of Determination)

It is applied in the same manner as in the first comparative example. Table 4 below shows the coefficient of determination, weight, and correction weight of each omics area and Table 5 compares the predicted value of omics-integrated biological age by summing the age for each omics area and weights for each omics area and the actual age.

TABLE 4

Telomeres
Methylation
Gene expression

Coefficient of
0.310
0.860
0.717

determination (R_x²)

Weight W_x

Corrected weight
1.46
7.14
3.54

W_x,rev
1.66
11.78
3.39

TABLE 5

Actual

AGE_telo
W_telo
AGE_meth
W_meth
AGE_exp
W_exp
AGE_integ
age

38.63
1.66
19.04
11.78
27.73
3.39
23.94
22

34.38

45.20

45.58

44.01
44

36.77

63.13

57.61

58.34
74

Table 6 compares the omics integrated regression analysis of biological age and age-weighted summation of omics integrated biological age prediction results for each omics area compared to individual omics biological age prediction.

TABLE 6

Telo-

mere-

integrated
Result of

based
Methyl-
Gene
omics-
age-

age
ation-
expression-
based
weighted

pred-
based age
based age
on age
summation

iction
prediction
prediction
prediction
by omics

Coefficient of
0.317
0.860
0.717
0.887
0.877

determination

(R²)

Significance
1.7E−05
1.5E−22
4.8E−15
8.2E−25
6.4E−24

(P-val)

Mean
10.593
4.637
6.263
4.052
4.506

absolute

error

(MAE)

Referring to Table 6, it can be shown that the omics integrated biological age prediction by omics integrated regression analysis or age weight summation for each omics area is closer to the actual age of the sample in terms of coefficient of determination and significance, and the mean error (MAE) is smaller compared to the age-predicted through multiple linear regression analysis from individual omics.

Hereinafter, with reference to the drawings, the third comparative example comparing the predictive power of the artificial neural network-based biological age to the actual age using the telomere length, sixteen methylation markers, and eighteen gene expression markers through the configurations disclosed herein is briefly described.

1) Omics Integrated Artificial Neural Network-Based Regression Analysis

In the third comparative example, the association analysis unit 140 of the present application performs artificial neural network-based regression analysis and omics integration analysis for each area by selecting sixteen methylation markers based on adj.Pval<1.0e-30 and eighteen gene expression markers based on adj.Pval<5.0e-02 along with the telomere length.

2) Summation Analysis of Biological Age Weights by Omics Area (Weighted Coefficient of Determination)

It is applied in the same manner as in the first comparative example.

Table 7 below shows the coefficient of determination, weight, and correction weight of each omics area and Table 8 compares the predicted value of omics-integrated biological age by summing the age for each omics area and weights for each omics area and the actual age.

TABLE 7

Telomeres
Methylation
Gene expression

Coefficient of
0.309
0.969
0.959

determination (R_x²)

Weight W_x
1.45
32.40
24.45

Corrected weight
1.25
140.86
41.82

W_x,rev

TABLE 8

Actual

AGE_telo
W_telo
AGE_meth
W_meth
AGE_exp
W_exp
AGE_integ
age

41.29
1.25
21.18
140.86
19.62
41.82
20.96
22

34.46

46.31

47.53

46.51
44

53.73

70.24

67.73

69.56
74

Table 9 compares the omics integrated regression analysis, and age-weighted summation omics integrated biological age prediction results compared to artificial neural network-based individual omics biological age prediction.

TABLE 9

integrated
Result of

Methyl-
Gene
omics-
age-

Telomere-
ation-
expression-
based
weighted

based age
based age
based age
on age
summation

prediction
prediction
prediction
prediction
by omics

Coefficient of
0.309
0.969
0.959
0.979
0.971

determination

(R²)

Significance
2.3E−05
1.1E−38
1.1E−35
6.5E−43
3.0E−39

(P-val)

Mean
10.656
1.712
2.285
1.773
1.671

absolute error

(MAE)

Referring to Table 9, it can be shown that the omics integrated biological age prediction by omics integrated regression analysis or age weight summation for each omics area is closer to the actual age of the sample in terms of coefficient of determination and significance, and the mean error (MAE) is smaller compared to the age predicted through artificial neural network-based regression analysis from individual omics.

Hereinafter, with reference to the drawings, the fourth comparative example compares the linear regression-based age prediction (weight scoring) using the telomere length, sixteen methylation markers, and eighteen gene expression markers through the configurations disclosed herein are described.

1) Omics Integrated Multiple Linear Regression Analysis

In the fourth comparative example, the association analysis unit 140 of the present application performs multiple linear regression analysis and omics integration analysis for each area by selecting sixteen methylation markers based on adjusted p-value<1.0e-30 and eighteen gene expression markers based on adj.pval<5.0e-02 along with the telomere length.

2-1) Summation Analysis of Biological Age Weights by Omics Area (Weighted Significance)

The association analysis unit 140 of the present application obtains the significance (PVAL_x) between the biological age predicted for each area (x) from multiple linear regression analysis using the markers of each omics area and the sample's actual age.

The weight allocation unit 150 of the present application calculates the weight (W_x) as in Equation 4 to transform the significance scale distributed in a very small error range.

W
_x=log(PVAL_x)*(−1) [Equation 4]

Further, when the difference in the significance between the biological age and the actual age between each omics area is large, the weight correction unit 160 of the present application calculates a corrected weight value (W_x,rev) as shown in Equation 5 through exponentiation of the average weight value (W_avg) of each omics area in order to emphasize and reflect the age of the omics area with high reliability in the weight (W_x) for each area.

The age prediction unit 170 of the present application calculates the biological age (AGE_integ) by summing the weights for each omics region.

$\begin{matrix} W_{x, rev} = {W_{avg}}^{(W_{x} / Wavg)} & [Equation 5] \end{matrix}$

$\begin{matrix} {AGE}_{integ} = \frac{\sum ({AGE}_{x} * w_{x, rev})}{\sum w_{x, rev}} & [Equation 6] \end{matrix}$

Table 10 below shows the significance, weight, and correction weight of each omics area, and Table 11 compares the predicted value of omics-integrated biological age by summing the age for each omics area and weights for each omics area and the actual age.

TABLE 10

Telomeres
Methylation
Gene expression

Significance (PVAL_x)
1.7E−05
6.4E−30
9.9E−21

Weight W_x
4.77
29.20
20.01

Correctedweight
2.15
108.67
24.84

W_x,rev

TABLE 11

Actual

AGE_telo
W_telo
AGE_meth
W_meth
AGE_exp
W_exp
AGE_integ
age

38.63
2.15
21.31
108.67
26.46
24.84
22.53
22

34.38

46.24

47.15

46.00
44

57.15

66.77

67.77

66.81
74

2-2) Summation Analysis of Biological Age Weights by Omics Area (Weighted Mean Error)

The association analysis unit 140 of the present application obtains mean absolute error (MAEx) between the biological age predicted for each area (x) from multiple linear regression analysis using the markers of each omics area and the sample's actual age.

The weight allocation unit 150 of the present application calculates the weight (W_x) of each omics area as in Equation 7 in order to give greater weight to the omics area with a small mean absolute error.

W
_x=1/mae_x,rev [Equation 7]

Further, in order to relatively reflect the mean absolute error compared to the actual age distribution, distribution correction (mae_x,rev) by the average age (AGE_avg) of the sample group is applied as shown in Equation 8 below, when the difference in the mean absolute error between the biological age and the actual age between each omics area is large, the weight correction unit 160 of the present application calculates a corrected weight value (W_x,rev) as shown in Equation 9 through exponentiation of the average weight value (W_avg) of each omics area in order to emphasize and reflect the age of the omics area with high reliability in the weight (W_x) for each area. Then, the integrated biological age (AGE_integ) is calculated by summing the correction weights for each omics area using Equation 10.

$\begin{matrix} {mae}_{x, rev} = {MAE}_{x} / {AGE}_{avg} & [Equation 8] \end{matrix}$

$\begin{matrix} W_{x, rev} = {W_{avg}}^{(Wx / Wavg)} & [Equation 9] \end{matrix}$

$\begin{matrix} {AGE}_{integ} = \frac{\sum ({AGE}_{x} * w_{x, rev})}{\sum w_{x, rev}} & [Equation 10] \end{matrix}$

Table 12 below shows the mean absolute error. Correction means absolute error, weight, and correction weight of each omics area, and Table 13 compare the predicted value of omics-integrated biological age by summing the age for each omics area and weights for each omics area and the actual age.

TABLE 12

Telomeres
Methylation
Gene expression

Mean absolute error
10.593
3.003
5.094

(MAE_x)

Correction mean
0.237
0.067
0.114

absolute error

(mae_x,rev)

Weight W_x
4.22
14.89
8.78

Corrected weight
2.75
35.57
8.21

W_x,rev

TABLE 13

Actual

AGE_telo
W_telo
AGE_meth
W_meth
AGE_exp
W_exp
AGE_integ
age

38.63
2.75
21.31
35.57
26.46
8.21
23.24
22

34.38

46.24

47.15

45.70
44

57.15

66.77

67.77

66.38
74

Table 14 below compares the age-weighted summation of omics integrated biological age prediction results to which each weighting method is applied compared to individual omics biological age prediction.

TABLE 14

integrated
Result of

Methyl-
Gene
omics-
age-

Telomere-
ation-
expression-
based
weighted

based age
based age
based age
on age
summation

prediction
prediction
prediction
prediction
by omics

Coefficient of
0.317
0.930
0.834
0.938
0.938

determination

(R²)

Significance
1.7E−05
6.4E−30
9.9E−21
3.3E−31
2.9E−31

(P-val)

Mean
10.593
3.003
5.094
2.991
2.987

absolute error

(MAE)

Referring to Table 14, it can be seen that the omics-integrated biological age, which is weighted by scoring significance or mean error compared to the predicted age through regression analysis from individual omics, is closer to the actual age of the sample in terms of coefficient of determination and significance, and the mean error is smaller.

Hereinafter, a method for predicting biological age based on various omics data analysis according to the first embodiment of the present invention is described with reference to FIGS. 2a and 2b.

The method S700 for predicting biological age based on various omics data analysis according to an embodiment of the present invention collects a plurality of genetic test samples, including DNA and RNA of a subject in the test sample collection unit 110 (S710), then analyzes a plurality of types of omics data (including at least one of telomere length, methylation, and gene expression) from each of the plurality of genetic test samples in the test sample analysis unit 120 (S720), and then preprocesses conversion of each marker value of the omics data analyzed through the test sample analysis unit 120 into a percentile value in the range of 0 to 1 in a preprocessing unit 130 (S730).

Thereafter, the method performs an association analysis based on the type of omics data for each omics area converted through the preprocessing unit 130 in the association analysis unit 140 (S740).

Process S740 is a process of analyzing the correlation between data for a plurality of omics areas using multiple linear regression analysis or artificial neural network-based regression analysis in which the analyzed correlation may be any one of the coefficients of determination (R_x²), significance (PVAL_x), and mean absolute error (MAE_x)

Thereafter, the method predicts the subject's age based on the analysis result of the association analysis unit 140 and the data for each omics region in the age prediction unit 170 (S750).

Process S750 may be a process of predicting the subject's age by integrating (summing) the analysis result data for each of the plurality of types of analyzed omics areas.

Hereinafter, a method for predicting biological age based on various omics data analysis according to the second embodiment of the present invention is described with reference to FIGS. 2c and 2d.

The method S800 for predicting biological age based on various omics data analysis according to an embodiment of the present invention collects a plurality of genetic test samples, including DNA and RNA of a subject in the test sample collection unit 110 (S810), then analyzes a plurality of types of omics data (including at least one of telomere length, methylation, and gene expression) from each of the plurality of genetic test samples in the test sample analysis unit 120 (S820), and then preprocesses conversion of each marker value of the omics data analyzed through the test sample analysis unit 120 into a percentile value in the range of 0 to 1 in a preprocessing unit 130 (S830).

Process S840 is a process of analyzing the correlation between data for a plurality of omics areas using multiple linear regression analysis or artificial neural network-based regression analysis in which the analyzed correlation may be any one of the coefficients of determination (R_x²) significance (PVAL_x), and mean absolute error (MAE_x)

When process S840 is completed, the method assigns a weight to each type of omics data based on any one of the associations (coefficient of determination, significance, and mean absolute error) analyzed through the association analysis unit in the weight allocation unit 150 (S850).

Here, the weight allocation unit 150 calculates the weight (W_x) for the coefficient of determination (R_x²), significance (PVAL_x), and mean absolute error (MAE_x) of each omics area using the following equations.

W
_x=1/(1−R_x²) (Weight equation for coefficient of determination)

W
_x=log(PVAL_x)*(−1) (Weight equation for significance)

W
_x=1/mae_x,rev (Weight equation for mean absolute error)

When process S850 is completed, the weight correction unit 160 may be configured to obtain a correction weight (W_x,rev) by exponentiating a weighted average value (W_avg) for each area of the weights given to each type of omics data using the following equations (S760).

W
_x,rev
=W
_avg
^(Wx/Wavg)

Meanwhile, the weight correction unit 160 may obtain distribution correction (mae_x,rev) by the average age (AGE_avg) of the sample group to relatively reflect the mean absolute error compared to the actual age distribution before weight correction for the mean absolute error (MAEx) of each omics area through the following equation.

mae
_x,rev
=MAE
_x/AGE_avg

When process S860 is completed, the age prediction unit 170 predicts the subject's age based on the analysis result of the association analysis unit and the data for each omics area, and the subject's age is predicted through the following equation (S870).

$\begin{matrix} {AGE}_{integ} = \frac{\sum ({AGE}_{x} * w_{x, rev})}{\sum w_{x, rev}} & (Biological age {AGE}_{integ} prediction equation) \end{matrix}$

Therefore, an embodiment of the present invention is used to combine and reflect markers of several omics areas in the biological age prediction model, thereby offsetting errors existing in individual omics area and allowing more accurate biological age prediction and interpreting them by dividing the influence (or aging state) of each omics area with respect to the predicted biological age (the current aging state of the subject).

For example, through a combination of three omics data, including the genome (telomere length), exogenous (methylation), and transcript (gene expression) of samples such as human blood, 1) the age prediction accuracy can be improved by canceling the noise inherent in each omics data, and 2) the biological age (degree of aging) of the subject can be analyzed separately for each omics.

The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention.

Claims

1. A system for predicting biological age based on various omics data analysis, the system comprising: a test sample collection unit for collecting a plurality of genetic test samples including at least one of DNA and RNA of a subject;a test sample analysis unit for analyzing a plurality of types of omics data from each of the plurality of genetic test samples;a preprocessing unit for preprocessing the omics data analyzed through the test sample analysis unit;an association analysis unit for performing an association analysis based on the omics type of data for each omics area converted through the preprocessing unit; andan age prediction unit for predicting the age of the subject based on the analyzed result of the association analysis unit and the data for each omics area.
2. The system of claim 1, wherein the plurality of types of omics data comprises at least one of telomere length, methylation, and gene expression.
3. The system of claim 1, wherein the preprocessing unit converts each marker value of the plurality of types of omics data into a percentile value in the range of 0 to 1.
4. The system of claim 1, wherein the association analysis unit uses any one of multiple linear regression analysis and artificial neural network-based regression analysis to analyze at least one of the coefficient of determination (Rx2), significance (PVALx), and mean absolute error (MAEx) of a plurality of omics regions.
5. The system of claim 1, wherein the age prediction unit predicts the age of the subject by integrating (summing) the analysis result data for each of a plurality of types of omics areas analyzed by the association analysis unit.
6. The system of claim 1, comprising: a weight allocation unit in which a weight is assigned to each type of omics data based on the coefficient of determination (Rx2) analyzed through the association analysis unit; anda weight correction unit for correcting weights assigned to each type of omics data.
7. The system of claim 1, further comprising: a weight allocation unit in which a weight is assigned to each type of omics data based on the significance (PVALx) analyzed through the association analysis unit; anda weight correction unit for correcting weights assigned to each type of omics data.
8. The system of claim 1, further comprising: a weight allocation unit in which a weight is assigned to each type of omics data based on the mean absolute error (MAEx) analyzed through the association analysis unit; anda weight correction unit for correcting weights assigned to each type of omics data.
9. The system of claim 6, wherein the age prediction unit predicts the age of the subject using the following equation based on the weight corrected through the weight correction unit, the analysis result of the association analysis unit, and the data for each omics area,
10. A method for predicting biological age based on various omics data analysis, the method comprising steps of: collecting a plurality of genetic test samples including at least one of DNA and RNA of a subject in a test sample collection unit;analyzing a plurality of types of omics data from each of the plurality of genetic test samples in a test sample analysis unit;preprocessing the omics data analyzed through the test sample analysis unit in a preprocessing unit;performing an association analysis based on each omics type of data for each omics area converted through the preprocessing unit in an association analysis unit; andpredicting the age of a subject based on the analysis result of the association analysis unit and the data for each omics area in the age prediction unit.
11. The method of claim 10, wherein the plurality of types of omics data comprises at least one of telomere length, methylation, and gene expression.
12. The method of claim 10, wherein the step of preprocessing converts each marker value of the plurality of types of omics data into a percentile value in the range of 0 to 1.
13. The method of claim 10, wherein the step of association analysis uses any one of multiple linear regression analysis and artificial neural network-based regression analysis to analyze at least one of the coefficient of determination (Rx2), significance (PVALx), and mean absolute error (MAEx) of a plurality of omics regions.
14. The method of claim 10, wherein the step of predicting an age predicts the age of the subject by integrating (summing) the analysis result data for each of a plurality of types of omics areas analyzed by the association analysis unit.
15. The method of claim 10, comprising: assigning a weight to each type of omics data based on the coefficient of determination (Rx2) analyzed through the association analysis unit in a weight allocation unit; andcorrecting weights assigned to each type of omics data in a weight correction unit.
16. The method of claim 10, further comprising: assigning a weight to each type of omics data based on the significance (PVALx) analyzed through the association analysis unit in a weight allocation unit; andcorrecting weights assigned to each type of omics data in a weight correction unit.
17. The method of claim 10, further comprising: assigning a weight to each type of omics data based on the mean absolute error (MAEx) analyzed through the association analysis unit in a weight allocation unit to; andcorrecting weights assigned to each type of omics data in a weight correction unit.
18. The method of claim 15, wherein the age of the subject is predicted in an age prediction unit using the following equation based on the weight corrected through the weight correction unit, the analysis result of the association analysis unit, and the data for each omics area,

Priority Claims (1)

Number	Date	Country	Kind
10-2020-0045382	Apr 2020	KR	national

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is a continuation of International Patent Application No. PCT/KR2021/004293, filed on Apr. 6, 2021, which claims priority to Korean Patent Application No. 10-2020-0045382 filed in the Korean Intellectual Property Office on Apr. 14, 2020, the disclosures of which are incorporated by reference herein in their entireties.

Continuations (1)

	Number	Date	Country
Parent	PCT/KR2021/004293	Apr 2021	US
Child	17965945		US

METHOD AND SYSTEM FOR PREDICTING BIOLOGICAL AGE ON BASIS OF VARIOUS OMICS DATA ANALYSES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

CROSS-REFERENCE TO RELATED APPLICATION(S)

Continuations (1)