We have invented a genotype-based method for predicting positive effects of exercise training on a clinical outcome, with the desired clinical outcome including, for example, increase in HDL-C at the expense of LDL-C in subjects. The predictive method is based on allelic variants of a set of marker biochemicals and is applicable to all humans, not only those with CVD (Thompson, P D et al., Metabolism 53:193 (2/2004)).
The following definitions will be used in the specification and claims:
It has surprisingly been found that physiogenomic methods can be employed to identify genetic markers associated with physiological response to exercise. Thus, a patient can be assayed for the presence of one or more of genetic markers and a personalized predicted response profile developed based on the presence or absence of the marker, the specific allele (i.e., heterozygous or homozygous), and the predictive ability of the marker.
The physiogenomics methods employed in the present invention are described generally in U.S. patent application Ser. No. 11/371,511 and U.S. patent application Ser. No. 11/010,716, both of which are hereby incorporated by reference. Briefly, the physiogenomics method for predicting whether a particular exercise regimen will produce a beneficial effect on a patient typically comprises (a) selecting a plurality of genetic markers based on an analysis of the entire human genome or a fraction thereof; (b) identifying significant covariates among demographic data and the other phenotypes preferably by linear regression methods (e.g., R2 analysis or principal component analysis); (c) performing for each selected genetic marker an unadjusted association test using genetic data; (d) optionally using permutation testing to obtain a non-parametric and marker complexity independent probability (“p”) value for identifying significant markers, wherein p denotes the probability of a false positive, and the significance is shown by p<0.10, more preferably p<0.05, and even more preferably p<0.01, and even more preferably p<0.001; (e) constructing a physiogenomic model by multivariate linear regression analyses and model parameterization for the dependence of the patient's response to exercise with respect to the markers, wherein the physiogenomic model has p<0.10, preferably p<0.05, and more preferably p<0.01, and even more preferably p<0.001; and (f) identifying one or more genes not associated with a particular outcome in the patient to serve as a physiogenomic control.
The physiogenomic method was used to identify an ensemble of markers which is predictive of a variety of physiological responses to exercise, including log of blood triglyceride level; blood LDL cholesterol level; blood HDL cholesterol level; ratio of total cholesterol to HDL cholesterol; LDL cholesterol, small fraction level; HDL cholesterol, large fraction level; blood glucose level; systolic blood pressure; diastolic blood pressure; body mass; body mass index; fat percentage; weight normalized maximum oxygen uptake; and maximum oxygen uptake.
The ensemble of marker genes will comprise one or more, preferably, two or more, and more preferred still, a plurality of gene variants. Preferred variants in accordance with the invention are single nucleotide polymorphisms (SNPs) which refers to a gene variant differing in the identity of one nucleotide pair from the normal gene. A variant is considered of a gene if it is within 100,000 base pairs of, preferably within 10,000 base pairs of, or more preferably contained in the transcribed sequence of the gene.
In a preferred embodiment, the ensemple of markers may comprise at least one, preferably at least two, and more preferably at least three SNP gene variants selected from the consisting of rs1041163 (VCAM1); rs1042718 (ADRB2); rs10460960 (CCK); rs10508244 (PFKP); rs10513055 (PIK3CB); rs10515070 (PIK3R1); rs107540 (CRHR2); rs10890819 (ACAT1); rs1131010 (PECAM1); rs1143634 (IL1B); rs11503016 (GABRA2); rs1171276 (LEPR); rs1255 (MDH1); rs1290443 (RARB); rs1322783 (DISC1); rs1356413 (PIK3CA); rs1396862 (CRHR1); rs1398176 (GABRA4); rs1440451 (HTR5A); rs167771 (DRD3); rs1799978 (DRD2); rs1800471 (TGFB1); rs1800871 (IL10); rs1801105 (HNMT); rs1801278 (IRS1); rs1801714 (ICAM1); rs1805002 (CCKBR); rs1891311 (HTR7); rs2005590 (APOL4); rs2067477 (CHRM1); rs2070424 (SOD1); rs2070586 (DAO); rs2076672 (APOL5); rs2162189 (SST); rs2229126 (ADRA1A); rs2240403 (CRHR2); rs2269935 (PFKM); rs2276307 (HTR3B); rs2278718 (MDH1); rs2296189 (FLT1); rs2298122 (DRD1IP); rs2514869 (ANGPT1); rs2515449 (MCPH1); rs322695 (RARB); rs324651 (CHRM2); rs334555 (GSK3B); rs3756007 (GABRA2); rs3760396 (CCL2); rs3822222 (CCKAR); rs3917550 (PON1); rs4121817 (PIK3C3); rs4149056 (SLCO1B1); rs4520 (APOC3); rs4531 (DBH); rs4675096 (IRS1); rs4726107 (PRKAG2); rs4792887 (CRHR1); rs4917348 (RXRA); rs4933200 (ANKRD1); rs5049 (AGT); rs5092 (APOA4); rs5361 (SELE); rs563895 (AVEN); rs5896 (F2); rs600728 (TEK); rs6078 (LIPC); rs6092 (SERPINE1); rs6131 (SELP); rs659734 (HTR2A); rs6700734 (TNFSF6); rs6967107 (WBSCR14); rs706713 (PIK3R1); rs707922 (APOM); rs7200210 (SLC12A4); rs722341 (ABCC8); rs7412 (APOE); rs7556371 (PIK3C2B); rs8178990 (CHAT); rs870995 (PIK3CA); rs885834 (CHAT); rs908867 (BDNF); rs936960 (LIPC); and combinations thereof.
In the foregoing list of SNPs, the abbreviation for the corresponding gene is provided in perentheses following each SNP. The specific variant will be selected from the foregoing SNPs or other variants of these or other genes determined to be associated with exercise response. Each individual gene variant is statistically associated to the respective physiological end point. The following table identifies exemplary SNPs, ranked based on the selection criteria of p≦0.05, for the physiological endpoints of change in blood LDL cholesterol level; change in blood HDL cholesterol level; change in log of blood triglyceride level; change in blood glucose level; change in LDL cholesterol, small fraction level; change in HDL cholesterol, large fraction level; change in systolic blood pressure; change in diastolic blood pressure; change in body mass; change in body mass index; change in waist size; change in fat percentage; change in weight normalized maximum oxygen uptake; and change in maximum oxygen uptake.
The SNPs and genes in Table 1 are provided in the nomenclature adopted by the National Center for Biotechnology Information (NCBI) of the National Institute of Health. The sequence data for the SNPs and genes listed in Table 1 is known in the art and is readily available from the NCBI dbSNP and GenBank databases. The sequence information for these and other representative SNPs is provided below in Table 2.
By combining the effect of several SNPs the necessary sensitivity and specificity of prediction is achieved for the ensemble of alleles, since the association of an individual SNP with the outcome does not have sufficient predictive power. The physigenomics method mathematically assigns to each SNP a coefficient according to pre-established rules and covariates. The generation of the coefficients is discussed in detail in the examples and in U.S. patent application Ser. No. 11/371,511 and U.S. patent application Ser. No. 11/010,716, both of which are incorporated by reference herein. The coefficient for each SNP may be either positive, indicating that the presence of that marker contributes to physiological response, or negative (i.e., a torpid marker). The most powerful predictions are achieved for a particular physiological endpoint by using SNPs having positive coefficients and SNPS having negative coefficients.
In accordance with this embodiment of the invention, the ensemble of marker genes comprises at least two SNPs, the presence of which in a human correlates with at least one physiological response to exercise; wherein the physiological response is selected from the group consisting of log of blood triglyceride level; blood LDL cholesterol level; blood HDL cholesterol level; ratio of total cholesterol to HDL cholesterol; LDL cholesterol, small fraction level; HDL cholesterol, large fraction level; blood glucose level; systolic blood pressure; diastolic blood pressure; body mass; body mass index; fat percentage; weight normalized maximum oxygen uptake; maximum oxygen uptake; and combinations thereof; and wherein the at least two SNP gene variants comprise at least one SNP gene variant having a positive coefficient and at least one SNP gene variant having a negative coefficient in the phyiotype model, including:
(1) in the case where said physiological response is a change in blood LDL cholesterol level, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs334555, rs1799978, rs870995, rs1398176, and rs5092; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs3118536, rs2005590, rs1041163, rs1800471, and rs707922; and
(2) in the case where the physiological response is a change in blood HDL cholesterol level, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs660339, rs894251, rs3760396, rs10513055, rs10513055, rs1800871, rs3760396, and rs1891311; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs936960, rs1143634, rs5049, and rs1891311; and
(3) in the case where the physiological response is a change in log of blood triglyceride level, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs722341, rs7602, rs4121817, rs5880, rs908867, rs2278718, rs2240403, and rs1171276; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs563895, rs2070586, rs1800871, rs2070586, rs10460960, rs2276307, rs11503016, and rs563895; and
(4) in the case where the physiological response is a change in blood glucose level, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs737865, rs10082776, rs10508244, rs1322783, rs2070424, rs107540, rs1042718, rs5361, and rs322695; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs1398176, rs722341, rs3822222, and rs2229126; and
(5) in the case where the physiological response is a change in LDL cholesterol, small fraction level, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs2033447, rs1877394, rs4917348, rs1131010, rs706713, rs4675096, and rs4917348; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs1045642, rs6131, rs2076672, rs6092, rs6078, rs659734, and rs885834; and
(6) in the case where the physiological response is a change in HDL cholesterol, large fraction level, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs10513055, rs1800871, and rs3760396; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs1799978, rs8192708, rs521674, rs5049, rs1042718, and rs4520; and
(7) in the case where the physiological response is a change in systolic blood pressure, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs597316, rs10515070, rs4149056, rs2298122, and rs6967107; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs2070424, rs6586179, rs1064344, rs1100494, rs1800871, rs1801105, rs7200210, and rs4726107; and
(8) in the case where the physiological response is a change in diastolic blood pressure, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs722341, rs3762272, rs600728, rs7556371, rs4531, and rs2067477; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs660339, rs662, rs2162189, rs2702285, and rs324651.
(9) in the case where the physiological response is a change in body mass, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs870995, rs600728, rs676643, rs2070424, rs1801278, rs6700734, and rs4792887; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs6541017, rs1041163, rs722341, rs2162189, rs1255, rs1440451, and rs3756007; and
(10) in the case where the physiological response is a change in body mass index, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs5880, rs600728, rs676643, rs2070424, rs1801278, and rs4792887; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs132642, rs2162189, rs1440451, rs936960, and rs167771; and
(11) in the case where the physiological response is a change in percentage fat, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs676643, rs2070424, rs885834, rs8178990, and rs600728; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs8192708, rs6312, rs722341, and rs1290443; and
(12) in the case where the physiological response is a change in weight normalized maximum oxygen uptake, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs8178990, rs5447, rs1800871, rs4149056, rs7412, and rs1901714; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs2298122, rs26312, rs563895, rs5896, rs3917550, rs2296189, and rs1356413; and
(13) in the case where the physiological response is a change in maximum oxygen uptake, the marker set comprises: (i) at least one SNP gene variant having a positive coefficient selected from the group consisting of rs11503016, rs2515449, rs334555, rs722341, rs4149056, rs7412, rs1396862, rs2515449, and rs1805002; and (ii) at least one SNP gene variant having a negative coefficient selected from the group consisting of rs597316, rs26312, rs2020933, rs563895, and rs5896.
The SNPs may be provided as an array on a solid support or the like. The array may be a micro or nano array. These SNPS may be used in a method of predicting an individual's physiological response to exercise. The method generally comprises (1) obtaining genetic material from the individual; and (2) assaying the genetic material for the presence of the at least two SNP gene variants of the foregoing ensemble.
In other interesting embodiments of the invention, the marker gene set correlated with physiological response to exercise comprises the plurality of SNP gene variants listed below (a)-(m), each being a distinct embodiment of the invention:
(a) The physiological response is a change in blood LDL cholesterol level and the plurality of SNP gene variants comprise at least one single SNP gene variant selected from the group consisting of rs334555, rs1799978, rs870995, rs1398176, rs5092, rs3118536, rs2005590, rs1041163, rs1800471, rs707922, and combinations thereof.
(b) The physiological response is a change in blood HDL cholesterol level and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs660339, rs894251, rs3760396, rs10513055, rs10513055, rs1800871, rs3760396, rs1891311, rs936960, rs1143634, rs5049, rs1891311, and combinations thereof.
(c) The physiological response is a change in log of blood triglyceride level and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs722341, rs7602, rs4121817, rs5880, rs908867, rs2278718, rs2240403, rs1171276, rs563895, rs2070586, rs1800871, rs2070586, rs10460960, rs2276307, rs11503016, and rs563895, and combinations thereof.
(d) The physiological response is a change in blood glucose level and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs737865, rs10082776, rs10508244, rs1322783, rs2070424, rs107540, rs1042718, rs5361, rs322695, rs1398176, rs722341, rs3822222, rs2229126, and combinations thereof.
(e) The physiological response is a change in LDL cholesterol, small fraction level and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs2033447, rs1877394, rs4917348, rs1131010, rs706713, rs4675096, rs4917348, rs1045642, rs6131, rs2076672, rs6092, rs6078, rs659734, rs885834, and combinations thereof.
(f) The physiological response is a change in LDL cholesterol, large fraction level and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs10513055, rs1800871, rs3760396, rs1799978, rs8192708, rs521674, rs5049, rs1042718, rs4520, and combinations thereof.
(g) The physiological response is a change in systolic blood pressure and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs597316, rs10515070, rs4149056, rs2298122, rs6967107, rs2070424, rs6586179, rs1064344, rs1100494, rs1800871, rs1801105, rs7200210, rs4726107, and combinations thereof.
(h) The physiological response is a change in diastolic blood pressure and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs722341, rs3762272, rs600728, rs7556371, rs4531, rs2067477, rs660339, rs662, rs2162189, rs2702285, rs324651, and combinations thereof.
(i) The physiological response is a change in body mass and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs870995, rs600728, rs676643, rs2070424, rs1801278, rs6700734, rs4792887, rs6541017, rs1041163, rs722341, rs2162189, rs1255, rs1440451, rs3756007, and combinations thereof.
(j) The physiological response is a change in body mass index and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs5880, rs600728, rs676643, rs2070424, rs1801278, rs4792887, rs132642, rs2162189, rs1440451, rs936960, rs167771, and combinations thereof.
(k) The physiological response is a change in percentage fat and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs676643, rs2070424, rs885834, rs8178990, rs600728, rs8192708, rs6312, rs722341, rs1290443, and combinations thereof.
(l) The physiological response is a change in weight normalized maximum oxygen uptake and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs8178990, rs5447, rs1800871, rs4149056, rs7412, rs1901714, rs2298122, rs26312, rs563895, rs5896, rs3917550, rs2296189, rs1356413, and combinations thereof.
(m) The physiological response is a change in maximum oxygen uptake and the plurality of SNP gene variants comprise at least one SNP gene variant selected from the group consisting of rs11503016, rs2515449, rs334555, rs722341, rs4149056, rs7412, rs1396862, rs2515449, rs1805002, rs597316, rs26312, rs2020933, rs563895, rs5896, and combinations thereof.
One embodiment of the present invention involves obtaining nucleic acid, e.g. DNA, from a blood sample of a subject, and assaying the DNA to determine the individuals' genotype of one or a combination of the marker genes associated with physiological response to exercise. Other sampling procedures include but are not limited to buccal swabs, saliva, or hair root. In a preferred embodiment, genotyping is performed using a gene array methodology, which can be readily and reliably employed in the screening and evaluation methods according to this invention. A number of gene arrays are commercially available for use by the practitioner, including, but not limited to, static (e.g. photolithographically set), suspended beads (e.g. soluble arrays), and self assembling bead arrays (e.g. matrix ordered and deconvoluted). More specifically, the nucleic acid array analysis allows the establishment of a pattern of genetic variability from multiple genes and facilitates an understanding of the complex interactions that are elicited in an individual in response to exercise.
In a specific embodiment, the array consists of several hundred genes and is capable of genotyping hundreds of DNA polymorphisms simultaneously. Candidate genes for use in the arrays of the present invention are identified by various means including, but not limited to, pre-existing clinical databases and DNA repositories, review of the literature, and consultation with clinicians, differential gene expression models, physiological pathways in metabolism, cholesterol and lipid homeostasis, and from previously discovered genetic associations.
Another specific aspect of the method involves obtaining DNA from a subject, and assaying the genetic material to determine if any of the SNP gene variants belonging to the marker gene set are present, wherein the presence of the one or more SNP gene variants is predictive of physiological response to exercise. Micro- and nano-array analysis of the subject's DNA is preferred in this specific aspect of the invention.
In another aspect, the present invention provides methods for the identification of a population of individuals that will respond favorably to exercise based on the physiological responses of change in blood triglyceride level; blood LDL cholesterol level; blood HDL cholesterol level; ratio of total cholesterol to HDL cholesterol; LDL cholesterol, small fraction level; HDL cholesterol, large fraction level; blood glucose level; systolic blood pressure; diastolic blood pressure; body mass; body mass index; fat percentage; weight normalized maximum oxygen uptake; maximum oxygen uptake, or any combination of these responses. These individuals, who are identified through screening using the methods of the present invention, are especially likely to benefit from exercise.
In another aspect, the present invention further provides a method for the development of novel diagnostic systems, termed “physiotypes”, which are developed from combinations of gene polymorphisms and baseline characteristics, to provide practitioners with individualized patient response profiles for physiological response to exercise.
Yet another aspect of the present invention provides a system containing a support or support material, e.g. a micro- or nano-array, comprising a novel set of marker genes and/or gene variants associated with physiological response to exercise in a form suitable for the practitioner to employ in a screening assay for determining an individual's genotype. In addition to the marker genes and gene variants, the system comprises an algorithm for predicting the physiological response to exercise based on a predetermined set of mathematical equations providing specific coefficients to each of the components of the array.
The ensembles, arrays, methods, and systems of the invention are contemplated to be useful to practitioners as a tool to promote exercise compliance. Beyond the standard life modification advice of “exercise and be physically active”, the physician can now be precise and scientific in suggesting a fitness regimen and can provide additional motivational factors including improving cholesterol profiles prior to utilization of drugs, reducing body fat and lowering weight and having a general positive effect on several physiological outcomes. These capabilities point out the emergence of exercise as a medical fitness prescription. Further, there is contemplated to be utility in the management of metabolic syndrome and its individual components, dyslipidemias, obesity, diabetes, and hypertension. The possibility of a physiological treatment, as opposed to drugs, introduces an entire new dimension and scientific empowerment to “life style modification.” Conversely, for individuals where the exercise response tends more toward body weight and fat, exercise becomes a true complement to diet. Also, there are expected to be benefits in healthcare integration with the possibility of the doctor supporting the exercise prescription with a supervised fitness program or referring a patient to an exercise physiologist, physical therapist or fitness trainer.
The recruitment of subjects, exercise training protocol, and physiological measurements used in this study are generally described in Thompson P D et al, Metabolism Vol. 53, No. 2, pp. 193-202 (2004), the contents of which is hereby incorporated by reference. Subjects were recruited at eight locations. Subjects initiated exercise training and completed a six month program. Subjects were recruited if they were: healthy and without orthopedic problems, non-smokers, physically inactive, ages from 18 to 70 years, and consumed two or fewer alcoholic beverages daily. Subjects were considered physically inactive if they participated in vigorous activity four or fewer times per month for the prior 6 months. Individuals were not recruited if their body mass index (BMI) exceeded 31, as caloric restriction reduces HDL-C. Subjects were avoided who might restrict their caloric intake during lipid measurement. Subjects underwent a medical history evaluation, physical exam, and a maximal exercise test to detect unreported abnormalities and occult coronary artery disease.
DNA was extracted from blood leukocytes for each subject. Genotyping was performed using the Illumina BeadArray™ platform and the GoldenGate™ assay (Oliphant et al, Biotechniques 32: S56-S61 (2002). For serum lipid and lipoprotein measurements, serum samples (preferably in duplicate) were obtained after a 12 hour fast before the start and after six months of exercise training. Post-training samples were obtained within 24 hours of the penultimate and final exercise training session. Lipid levels in women before and after training were obtained within ten days of the onset of menses to avoid variations in lipoprotein values (Culliname E M et al, Metabolism 44:565 (1995)). Serum was separated from plasma and frozen at −70 degrees Celsius until analyzed by the Lipid Research Laboratory, Lifespan Health System, Brown University, Providence (RI). All samples from an individual subject were analyzed in the same analysis run at the end of the study to minimize the effect of laboratory variation. Total cholesterol, TGs, LDL-C, HDL-C, and subfractions were determined using standard techniques (Thompson P D et al, Metabolism 46:217 (1997)).
For anthropometric measurements, body weight and height were measured using balance beam scales and wall mounted tape measures. Skinfold thickness was measured on the right side of the body using calipers to estimate percent body fat in men and women.
To determine maximal exercise capacity, subjects underwent two pre- and one post-training maximal treadmill exercise tests using the modified Astrand protocol (Pollack M L et al, Exercise in Health and Disease, Saunders, Philadelphia, Pa., 1984). The first pre-training test was designed to detect occult ischemia and to familiarize subjects with the measurement protocol, but was not used in data analysis. Blood pressure and 12-lead ECG, as well as expired oxygen, carbon dioxide, and ventilatory volume were measured. Maximal oxygen uptake was defined as the average of the two highest consecutive 30-second values at peak exercise.
Subjects were requested to maintain their usual dietary composition throughout the study. Dietary calories and composition were assessed by random, 24-hour dietary recalls. Trained dieticians called the subjects by telephone on one weekday and one weekend day before the start and during the last month of exercise training. Results from the two calls were averaged to estimate dietary intake.
Subjects underwent a progressive, supervised exercise training program. The duration of each exercise session was increased from 15 to 40 minutes during the first four weeks. Subjects exercised between 60 and 85% of their maximal exercise capacity based on their pre-determined maximal heart rate. Once subjects could perform 40 minutes of exercise, they continued this duration of exercise 4 days a week for an additional 5 months for a total of 6 months of participation. Subjects also participated in 5 minutes of warm-up and cool-down so that each workout required 50 minutes. Treadmill exercise was the primary mode of training but subjects were able to use a variety of training modalities including treadmills, stationary cycles, cross-country ski machines, stair steppers, and rowing machines for variety and to minimize orthopedic injury.
Weekly exercise energy expenditure expressed as kilocalories per week was estimated from the average heart rates recorded for exercise sessions of that week. From individual plots of VO2 vs. heart rate created from pre-training maximal exercise test data, we estimated the VO2 corresponding to the training exercise heart rate intensity and multiplied that VO2 by training session duration to obtain total oxygen consumption for each bout. Each liter of oxygen was assumed to represent 5 kilocalories of energy expenditure.
We tested the inventive method by examining the effects of exercise on blood triglyceride level (log transformed); blood LDL cholesterol level; blood HDL cholesterol level; ratio of total cholesterol to HDL cholesterol; LDL cholesterol, small fraction level; HDL cholesterol, large fraction level; blood glucose level; systolic blood pressure; diastolic blood pressure; body mass; body mass index; fat percentage; weight normalized maximum oxygen uptake; and maximum oxygen uptake, as a function of various SNP markers. We correlated the exercise responses as measured by various outcomes with the variability of selected candidate genes using physiogenomics. Physiogenomics was used as a technique to explore the variability in patient response to exercise. Physiogenomics is a medical application of sensitivity analysis [Ruaño, et al., Physiogenomics: Integrating systems engineering and nanotechnology for personalized health. In: Joseph. D. Bronzino, ed. The Biomedical Engineering Handbook, 3rd edition, 2006.]. Sensitivity analysis is the study of the relationship between the input and the output of a model and the analysis, utilizing systems theory, of how variation of the input leads to changes in output quantities. Physiogenomics utilizes as input the variability in genes, measured by single nucleotide polymorphisms (SNP) and determines how the SNP frequency among individuals relates to the variability in physiological characteristics, the output.
The goal of the investigation was to develop physiogenomic markers for predicting physiological response to exercise by using an informatics platform to analyze data from exercise studies.
Potential associations of marker genes to exercise. Various SNPs associated with, for example, the observation of lipid level and BMI changes in patients undergoing exercise treatment were screened. The endpoints analyzed were log of blood triglyceride level; blood LDL cholesterol level; blood HDL cholesterol level; ratio of total cholesterol to HDL cholesterol; LDL cholesterol, small fraction level; HDL cholesterol, large fraction level; blood glucose level; systolic blood pressure; diastolic blood pressure; body mass; body mass index; fat percentage; weight normalized maximum oxygen uptake; and maximum oxygen uptake. The physiogenomic model was developed using the following procedure: 1) Establish a baseline model using only the demographic and clinical variables, 2) Screen for associated genetic markers by testing each SNP against the unexplained residual of the baseline model, and 3) Establish a revised model incorporating the significant associations from the SNP screen. All models are simple linear regression models, but other well-known statistical methods are contemplated to be useful.
Tables 6-19 list the SNPs that have been found to be associated with each outcome with only SNPs with a statistical significance level of 0.05 being shown. The baseline variables (covariates) broken down by demographic factors are shown in Tables 20 and 21, where the variables indicated as “pre” represent the initial value of the indicated response.
In the SNP screen (step 2), the p-values for each SNP were obtained by adding the SNP to the baseline model and comparing the resulting model improvement with up to 10,000 simulated model improvements using the same data set, but with the genotype data randomly permuted to remove any true association. This method produces a p-value that is a direct, unbiased, and model-free estimate of the probability of finding a model as good as the one tested when the null hypothesis of no association is true. All SNPs with a screening p-value of better than 0.003 were selected to be included in the physiogenomic model (step 3).
Data Analysis. Covariates were analyzed using multiple linear regression and the stepwise procedure. An extended linear model was constructed including the significant covariate and the SNP genotype. SNP genotype was coded quantitatively as a numerical variable indicating the number of minor alleles: 0 for major homozygotes, 1 for heterozygotes, and 2 for minor homozygotes. The F-statistic p-value for the SNP variable was used to evaluate the significance of association. Table 1 lists all SNPs that were tested and their association p-values. The validity of the p-values were tested by performance of an independent calculation of the p-values using permutation testing. To account for the multiple testing of multiple SNPs, adjusted p-values were calculated using Benjamini and Hochbergs false discovery rate (FDR) procedure [Reinere A, Yekutiele D, Benjamini Y: Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19:368-375 (2003); Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57:289-300 (1995); Benjamini Y, Hochberg Y: On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics 25:60-83 (2000).]. In addition, the power for detecting an association based on the Bonferroni multiple comparison adjustment was evaluated. For each SNP, the effect size in standard deviations that was necessary for detection of an association at a power of 80% (20% false negative rate) was calculated using the formula:
where α was the desired false positive rate (α=0.05), β the false negative rate (β=1-Power=0.2), c the number of SNPs, z a standard normal deviate, N the number of subjects, f the carrier proportion, and Δ the difference in change in response between carriers and non-carriers expressed relative to the standard deviation [Rosner B: Fundamentals of Biostatistics. Belmont, Calif.: Wadsworth Publishing Co. (1995).].
LOESS representation. A locally smoothed function of the SNP frequency as it varies with each response was used to visually represent the nature of an association. LOESS (LOcally wEighted Scatter plot Smooth) is a method to smooth data using a locally weighted linear regression [Cleveland, W S: Robust locally weighted regression and smoothing scatterplots. Journal of American Statistical Association 74, 829-836 (1979); Cleveland W S, Devlin S J: Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association Vol. 83, pp. 596-610 (1988).]. At each point in the LOESS curve, a quadratic polynomial was fitted to the data in the vicinity of that point. The data were weighted such that they contributed less if they were further away, according to the following tricubic function where x was the abscissa of the point to be estimated, the xi were the data points in the vicinity, and d(x) was the maximum distance of x to the xi.
The distribution of change in each parameter in the study population are approximately normal. The potential covariates of age, gender, race, are tested for association with each parameter using multiple linear regression. The LOESS curve will show the localized frequency of the least common allele for sectors of the distribution. For SNPs with a strong association, the marker frequency is significantly different between the high end and the low end of the distribution. Conversely, if a marker is neutral, the frequency is independent of the response and the LOESS curve is essentially flat.
If an allele is more common among patients with high response than among those with low response, the allele is likely to be associated with increased response. Similarly, when the allele is less common in those with high response, the allele is associated with decreased response. Thus, the slope of the curve is an indication of the degree of association.
a. Data analysis. The objective of the statistical analysis is to find a set of physiogenomic factors that together provide a way of predicting the outcome of interest. The association of an individual factor with the outcome may not have sufficient discrimination ability to provide the necessary sensitivity and specificity, but by combining the effect of several such factors the objective is reached. Increased sensitivity and specificity for the cumulative effect on prediction can be achieved through the use of common factors that are statistically independent. The assumptions on which these calculations are based are (a) the factors are independent of each other, (b) the association between each factor and the outcome can be summarized by a modest odds ratio of 1.7, and (c) the prevalence of each physiogenomic factor in the population is 50% and independent of the others. Clearly, the prediction becomes even stronger if the association with the response is stronger or one finds additional predictors. However, factors that are less useful for these types of prediction are those that are less common in the population, or collinear with factors that have already been identified in the prediction model.
a. Data analysis. The objective of the statistical analysis is to find a set of physiogenomic factors that together provide a way of predicting the outcome of interest. The association of an individual factor with the outcome may not have sufficient discrimination ability to provide the necessary sensitivity and specificity, but by combining the effect of several such factors the objective is reached. Increased sensitivity and specificity for the cumulative effect on prediction can be achieved through the use of common factors that are statistically independent. The assumptions on which these calculations are based are (a) the factors are independent of each other, (b) the association between each factor and the outcome can be summarized by a modest odds ratio of 1.7, and (c) the prevalence of each physiogenomic factor in the population is 50% and independent of the others. Clearly, the prediction becomes even stronger if the association with the response is stronger or one finds additional predictors. However, factors that are less useful for these types of prediction are those that are less common in the population, or collinear with factors that have already been identified in the prediction model.
b. Model Building. Discovery of markers affecting response to exercise. A multivariate model was developed for the purpose of predicting a given response (Y) to exercise. A linear model for subjects in a group of patients subjected to exercise was used in which the response of interest can be expressed as follows:
where Mi are the dummy marker variables indicating the presence of specified genotypes and Dj are demographic and clinical covariates. The model parameters that are to be estimated from the data are R0, αi and βj. This model employs standard regression techniques that enable the systematic search for the best predictors. S-plus provides very good support for algorithms that provide these estimates for the initial linear regression models, as well other generalized linear models that may be used when the error distribution is not normal. For continuous variables, generalized additive models, including cubic splines in order to appropriately assess the form for the dose-response relationship may also be considered [Hastie T, Tibshirani R. Generalized additive models. Stat. Sci. 1: 297-318 (1986); Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics in Medicine 8:551-561 (1989).].
In addition to optimizing the parameters, model refinement is performed. The first phase of the regression analysis will consist of considering a set of simplified models by eliminating each variable in turn and re-optimizing the likelihood function. The ratio between the two maximum likelihoods of the original vs. the simplified model then provides a significance measure for the contribution of each variable to the model.
The association between each physiogenomic factor and the outcome is calculated using logistic regression models, controlling for the other factors that have been found to be relevant. The magnitude of these associations are measured with the odds ratio and the corresponding 95% confidence interval, and statistical significance assessed using a likelihood ratio test. Multivariate analyses is used which includes all factors that have been found to be important based on univariate analyses.
Because the number of possible comparisons can become very large in analyses that evaluate the combined effects of two or more genes, the results include a random permutation test for the null hypothesis of no effect for two through five combinations of genes. This is accomplished by randomly assigning the outcome to each individual in the study, which is implied by the null distribution of no genetic effect, and estimating the test statistic that corresponds to the null hypothesis of the gene combination effect. Repeating this process 1000 times will provide an empirical estimate of the distribution for the test statistic, and hence a p-value that takes into account the process that gave rise to the multiple comparisons. In addition, hierarchical regression analysis is considered to generate estimates incorporating prior information about the biological activity of the gene variants. In this type of analysis, multiple genotypes and other risk factors can be considered simultaneously as a set, and estimates will be adjusted based on prior information and the observed covariance, theoretically improving the accuracy and precision of effect estimates [Steenland K, Bray I, Greenland S, Boffetta P. Empirical Bayes adjustments for multiple results in hypothesis-generating or surveillance studies. Ca Epidemiol Biomarkers Prev. 9:895-903 (2000).].
c. Power calculations. The power available for detecting an odds ratio (OR) of a specified size for a particular allele was determined on the basis of a significance test on the corresponding difference in proportions using a 5% level of significance. The approach for calculating power involved the adaptation of the method given by Rosner [Rosner B: Fundamentals of Biostatistics. Belmont, Calif.: Wadsworth Publishing Co. (1995).]. The SNPs that are explored in this research are not so common as to have prevalence of more than 35%, but rather in the range of 10-15%. Therefore, it is apparent that the study has at least 80% power to detect odds ratios in the range of 1.6-1.8, which are modest effects.
d. Model validation. A cross-validation approach is used to evaluate the performance of models by separating the data used for parameterization (training set) from the data used for testing (test set). The approach randomly divides the population into the training set, which will comprise 80% of the subjects, and the remaining 20% will be the test set. The algorithmic approach is used for finding a model that can be used for prediction of exercise response that will occur in a subject using the data in the training set. This prediction equation is then used to prepare an ROC curve that provides an independent estimate of the relationship between sensitivity and specificity for the prediction model.
e. Patient Physiotype. Tables 22 through 34 show a collection of physiotypes for the outcomes log of blood triglyceride level (logTG); blood LDL cholesterol level (LDL); blood HDL cholesterol level (HDL); LDL cholesterol, small fraction level (LDLSM); HDL cholesterol, large fraction level (HDLLG); blood glucose level (GLU); systolic blood pressure (SBP); diastolic blood pressure (DSP); body mass (BMS); body mass index (BMI); fat percentage (PFAT); weight normalized maximum oxygen uptake (VMAX); maximum oxygen uptake (VMAXL). Each physiotype in this particular embodiment consists of a selection of markers, and intercept value (C), and a coefficient (ci) for each marker. For example, the LDL physiotype, in one embodiment, consists of the markers rs2005590, rs1041163, rs1800471, rs1799978, rs870995, rs707922, rs1398176, and rs5092, and the corresponding coefficients −0.53177, −0.29832, −0.69604, 0.92244, 0.28492, −0.25665, 0.26321, and 0.26693, respectively. The predicted LDL response for a given individual is then given by the formula:
where C is the intercept, the ci are the coefficients and the gi are the genotypes, coded 0 for the wild type allele homozygote, 1 for the heterozygote, and 2 for the variant allele homozygote.
In this embodiment, the physiotype consists of a linear regression model with no interactions. In another embodiment, interaction terms of two or more variables may be added to the model. In other embodiments, the physiotype might consist of a generalized linear regression model, a structural equation model, a Baysian probability network, or any other modeling tool known to the practitioner of the art of statistics.
For each physiolocial parameterm the patient's genotype (0, 1, or 2) is multiplied by the coefficient corresponding to the effect of the particular SNP on a particular response given in the tables above. For each response, the sum
is added to the intercept value C to determine the predicted response to exercise for the patient.
While the SNP ensembles provided in the tables above provide a marked improvement over individual SNPs for predicting the given clinical outcomes, it will be understood that the invention is not limited to these precise ensembles. Rather, each individual SNP and subcombinations of these SNPs are also considered to be within the scope of the invention. Preferably the ensemble is predictive of two or more responses, more preferably, three or more responses, more preferred still, four or more responses. In a preferred embodiment, the ensemble of SNPs is predictive of blood triglyceride level; blood LDL cholesterol level; blood HDL cholesterol level; ratio of total cholesterol to HDL cholesterol; LDL cholesterol, small fraction level; HDL cholesterol, large fraction level; blood glucose level; systolic blood pressure; diastolic blood pressure; body mass; body mass index; waist size, fat percentage; weight normalized maximum oxygen uptake; and maximum oxygen uptake; or any combination thereof.
In the preferred practice of the invention, the ensemble of markers for a particular physiological outcome will comprise at least one SNP having a positive (+) coefficient and at least one SNP having a negative (−) coefficient. In other embodiments, the ensemble will have at least two (or more than two) SNPs, predictive of the same physiological outcome, having a positive (+) coefficient and at least two (or more than two) SNPs, predictive of the same physiological outcome, having a negative (−) coefficient.
The separate physiotypes of Tables 22-34 can be consolidated into a collective physiotype table to provide an ensemble of SNPs predictive of a plurality of physiological responses to exercise. A representative physiotype table showing for one patient is provided in Table 35, wherein the coefficients, ci, have been omitted for brevity and only their relative contribution (+ or −) indicated.
The patient's physiotype may be expressed in a convenient format for the practitioner's assessment of a patient's likely response to exercise, as shown in
The content of all patents, patent applications, published articles, abstracts, books, reference manuals, sequence accession numbers, as cited herein are hereby incorporated by reference in their entireties to more fully describe the state of the art to which the invention pertains.