The present invention is in the field of physiological genomics, hereafter referred to as “physiogenomics”. More specifically, the invention relates to the use of genetic variants of marker genes to predict an individual's responsiveness to diet. The invention also relates to methods for generating patient-specific physiotype models for expressing predicted effects of diet on Body Mass Index (BMI) and HDL cholesterol, LDL cholesterol, and triglyceride levels.
It has recently been estimated that the obesity rate among the adult population of the United States has doubled in the past two decades to a staggering 31% [1]. During the same time, the percentage of adults that are either overweight or obese rose to 65% [1]. This dramatic rise in obesity has led to a public health crisis owing to the increased prevalence of disease with increased body weight. For example, overweight and obesity increases the risk of developing cardiovascular disease, cancer, diabetes, high blood pressure, elevated cholesterol, and stroke, among other serious conditions.
Despite the public health imperative of obesity, the morbidity and mortality associated with obesity are largely preventable with lifestyle and dietary changes. For example, a recent report by the American Institute for Cancer Research and the world Cancer Research Fund estimated that 30-40% of cancer cases worldwide are preventable by diet [2]. Other studies report that progression to diabetes in pre-diabetics may be reduced by 40-58% through lifestyle intervention, including dietary modification [3].
Many well known diets exist which emphasize restriction of various dietary components. For example, the latest recommendations call for a diet that is high in carbohydrates and low in total fat, saturated fat and cholesterol [4]. However, it is an alarming reality that low-fat/high-carbohydrate diets actually exacerbate the co-morbidity of obesity, diabetes, and cardiovascular disease, a condition now recognized as Metabolic Syndrome (MetSyn) which is characterized as having 3 or more of the following abnormalities: (1) large waist circumference (>102 cm in men, 88 cm in women), (2) elevated serum triglycerides (>150 mg/dL), (3) depressed high density lipoprotein (HDL, <40 mg/dL in men, 50 mg/dL in women), (4) elevated blood pressure (systolic >130 mm Hg or diastolic ≧80 mmHg), and (5) elevated serum glucose (>110 mg/dL) [4]. A primary problem with low-fat/high-carbohydrate diets is that they contribute to carbohydrate-induced hypertriglyceridemia [5], a major problem underlying the metabolic disorders of MetSyn. Well-controlled feeding studies indicate that low-fat/high-carbohydrate diets exacerbate the dyslipidemia of MetSyn when not associated with significant weight loss or increased physical activity [6,7]. Low-fat/high-carbohydrate diets have unfavorable effects on fasting triglycerides [8], HDL-C [9], and size and composition of LDL-C [10,11]. Clearly, the standard low-fat recommendations are not suited for all individuals and may in fact be counterproductive to dieting goals.
Recently, carbohydrate-restricted diets have gained increased popularity. In this regard, very-low-carbohydrate “ketogenic” diets (VLCKDs) have proven effective in combating obesity for many individuals. VLCKDs differ dramatically from the standard recommendations by emphasizing a reduction in carbohydrates and thus are inherently high in total fat, saturated fat, and cholesterol. For this reason, VLCKDs have been criticized as having potential adverse effects on blood lipoproteins and other risk factors for cardiovascular disease and diabetes [12]. However, these criticisms are largely unsupported and based on a misunderstanding of the physiological adaptations to carbohydrate restriction. Recent studies have demonstrated that short-term VLCKDs consistently result in improvements in fat-loss and a number of cardiovascular disease factors as compared to low-fat diets [12]. Nonetheless, substantial variability in cholesterol and blood lipid responses to VLCKDs has been reported [13].
It may therefore be said that it is impossible to create a singular optimal diet for everyone. This is because of the complex interactions among nutrition, environment, and, importantly, genetics. A genetic explanation for variability in response to diet has been shown in studies that measured polymorphisms of selected candidate genes, usually apolipoproteins. APOE and APOA1 predict cholesterol responses to diet [14]. An increase in polyunsaturated fat increases HDL-C in individuals carrying the A allele at the −75 G/A genetic polymorphism of the APOA1 gene, whereas those with the more common G allele decrease HDL-C [15]. The response of HDL-C to increases in total fat, particularly animal fat, is explained in part by a polymorphism in the hepatic lipase gene [16]. While intriguing, genetic variations in single genes are not precise in their predictive power. More robust approaches that consider many genes and the multidimensional interactions with various phenotypes are needed. Thus, despite the recognition of the importance of genomics in personalized nutrition [17,18,19,20], there are currently no viable methods to guide personalized diet prescription.
The emerging field of physiogenomics offers an important approach for integrating genotype, phenotype, and population analysis of functional variability among individuals. In physiogenomics, genetic markers (e.g. single nucleotide polymorphisms or “SNPs”) are analyzed to discover statistical associations to physiological characteristics or outcomes in populations of individuals either at baseline or after they have been exposed to an environmental trigger such as dietary intervention.
It is therefore an object of the present invention to provide physiogenomic methods and tools for predicting the response of individuals to diet based on genetic factors, alone or in combination with demographic factors.
It is another object of the invention to provide genetic markers and arrays of genetic markers which are predictive of BMI, triglyceride, and blood lipid response to diet.
It is a further object of the invention to provide combinations of genetic markers which are more predictive of BMI, triglyceride, and blood lipid response to diet than the individual markers.
In accordance with the foregoing objectives and others, the present invention provides physiogenomic methods and tools for a priori determination of an individual's likely response to diet. The method utilizes physiogenomics to identify gene variants, in particular single nucleotide polymorphisms (SNPs), which correlate with changes in BMI, LDL, HDL, and triglyceride levels in response to diet.
In one aspect, the present invention provides marker gene sets comprising a plurality of single nucleotide polymorphic gene variants, wherein the presence of any one of said single nucleotide polymorphic gene variants in a human is predictive of physiological response to diet. The physiological response may be, for example, change in blood triglyceride level, change in blood LDL level, change in blood HDL level, change in body mass index, or any combination of these responses.
In one interesting implementation according to this aspect of the invention, the plurality of single nucleotide polymorphic gene variants will comprise at least one single nucleotide polymorphic gene variant of a gene selected from the groups consisting of ABCB1, ACACB, ACAT1, ACHE, ADRB1, ADRB2, AKT1, AKT2, ANGPT1, APOB, APOH, APOL3, APOL4, AVEN, BDNF, CETP, CHAT, CHKB, CPT1A, CRHR2, CYP1A2, CYP2C19, CYP7A1, DBH, DRD3, DRD4, DRD5, DTNBP1, FLJ32252, FLT1, GABRA2, GAD1, GAD2, GAL, GNAO1, GYS2, HIF1A, HTR3A, ICAM1, IL10, IL1R1, INSR, IRS1, KDR, LDLR, LIPE, LIPF, LOC391530, OLR1, OXT, PIK3C2G, PIK3C3, PIK3R1, PPARG, PRKAA1, PRKAB1, RARB, RARG, RXRA, SCARB2, SELE, SSTR3, VEGF, and combinations thereof.
Particularly interesting single nucleotide polymorphisms of the foregoing genes are selected from the group consisting of rs10082776, rs1018381, rs1040410, rs10422283, rs1042713, rs1042718, rs1045642, rs10460960, rs1062688, rs1064344, rs107540, rs10841044, rs10890819, rs11212515, rs1128503, rs1150226, rs1190762, rs1290443, rs132642, rs132661, rs1478290, rs167771, rs1801278, rs1801701, rs1951795, rs2005590, rs2033447, rs2049045, rs2071710, rs2125489, rs2l92752, rs2240403, rs2241220, rs2301108, rs2306179, rs2429511, rs2470890, rs2494746, rs2514869, rs2702285, rs2742115, rs2743867, rs2867383, rs3024492, rs322695, rs3750546, rs3756007, rs3757868, rs3791850, rs3791981, rs3792822, rs3808607, rs3813065, rs3853188, rs4135268, rs4244285, rs4531, rs461404, rs4802071, rs4804103, rs4986894, rs4987059, rs5030390, rs5361, rs563895, rs5880, rs5883, rs597316, rs619698, rs6265, rs694066, rs706713, rs748253, rs8110695, rs814628, rs8178847, rs8190586, rs833060, rs877172, and rs885834.
In another implementation of the invention, the marker gene set is predictive of change in blood LDL level and the single nucleotide polymorphic gene variants comprise one or more single nucleotide polymorphic gene variants selected from the group consisting of rs1018381, rs4804103, or both.
In another implementation of the invention, the marker gene set is predictive of change in blood HDL level and the single nucleotide polymorphic gene variants comprise one or more single nucleotide polymorphic gene variants selected from the group consisting of rs1064344, rs3756007, rs8110695, rs4244285, rs3024492, rs2192752, rs2514869, rs1190762, and combinations thereof.
In yet another implementation of the invention, the marker gene set is predictive of change in blood triglyceride level and the single nucleotide polymorphic gene variants comprise one or more single nucleotide polymorphic gene variants selected from the group consisting of rs132642, rs3757868, rs1951795, rs3791981, rs10460960, and combinations thereof.
In a further implementation of the invention, the marker gene set is predictive of change in body mass index and the single nucleotide polymorphic gene variants comprise one or more single nucleotide polymorphic gene variants selected from the group consisting of rs814628, rs4531, rs2306179, rs4987059, rs5883, rs5361, rs877172, and combinations thereof.
Another specific aspect of the method involves obtaining genetic material, e.g. DNA or RNA, from a subject, and assaying the genetic material to determine if any of the single nucleotide polymorphic gene variants belonging to the marker gene set are present, wherein the presence of the one or more single nucleotide polymorphic gene variants is predictive of physiological response to diet. Micro- and nano-array analysis of the subject's genetic material is preferred in this specific aspect of the invention.
In another aspect, the present invention further provides a method for the development of novel diagnostic systems, termed “physiotypes”, which are developed from combinations of gene polymorphisms and baseline characteristics, to provide physicians with individualized patient response profiles for physiological response to diet.
Yet another aspect of the present invention provides a system containing a support or support material, e.g. a micro- or nano-array, comprising a novel set of marker genes and/or gene variants associated with physiological response to diet in a form suitable for the practitioner to employ in a screening assay for determining an individual's genotype. In addition to the marker genes and gene variants, the system comprises an algorithm for predicting the physiological response to diet based on a predetermined set of mathematical equations providing specific coefficients to each of the components of the array.
In another aspect, the present invention provides methods for the identification of a population of individuals that will respond favorably to diet, including but not limited to carbohydrate-restricted diet, based on the physiological responses of change in blood triglyceride level, change in blood LDL level, change in blood HDL level, change in body mass index, or any combination of these responses. These individuals, who are identified through screening using the methods of the present invention, are especially amenable to carbohydrate-restricted diet to reduce weight and reduce the occurrence of obesity related morbidity and mortality.
These and other aspects of the present invention will be better understood upon a reading of the following detailed description when considered in connection with the accompanying figures.
Very low-carbohydrate (VLC) diets have been reported to outperform low-fat diets on a number of metrics, including weight/fat loss and metabolic biomarkers of CVD and diabetes [21-29]. Table 1 summarizes published studies comparing low-fat (LF) and very low-carbohydrate (VLC) diets on percent changes in fasting blood lipids and postprandial lipemia.
*P ≦ 0.05 from corresponding change on LF diet. TC = total cholesterol; TAG = triglyceride; PP = postprandial; NW = normal-weight; OW = overweight
Despite the reported efficacy of VLC diet intervention, the variability in response highlights the desirability of providing individualized dietary regimens to optimize response. This approach necessary requires the consideration of patient genotype. However, the specific contribution of genetic influences to diet response is not well understood. Therefore, it has not previously been possible to provide accurate patient-specific dietary recommendations. It has surprisingly been found that physiogenomic methods can be employed to identify genetic markers associated with response to interventional VLC dieting. A patient can then be assayed for the presence of one or more of the genetic markers and a personalized predicted response profile developed based on the presence or absence of the marker, the specific allele (i.e., heterozygous or homozygous), and the predictive ability of the marker.
The physiogenomics methods employed in the present invention are described generally in U.S. patent application Ser. No. 11/010,716, the contents of which are hereby incorporated by reference. Briefly, the physiogenomics method for predicting whether a particular treatment regimen will produce a beneficial effect on a patient typically comprises (a) selecting a plurality of genetic markers based on an analysis of the entire human genome or a fraction thereof; (b) identifying significant covariates among demographic data and the other phenotypes preferably by linear regression methods (e.g., R2 analysis followed by principal component analysis); (c) performing for each selected genetic marker an unadjusted association test using genetic data; (d) using permutation testing to obtain a non-parametric and marker complexity probability (“p”) value for identifying significant markers, wherein the significance is shown by p<0.10, more preferably p<0.05, and even more preferably p<0.01; (e) constructing a physiogenomic model by linear regression analyses and model parameterization for the dependence of said patient's response to treatment with respect to said markers, wherein said physiogenomic model has p<0.10, more preferably p<0.05, and even more preferably p<0.01; and (f) identifying one or more genes not associated with a particular outcome in said patient to serve as a physiogenomic control.
One embodiment of the present invention involves obtaining nucleic acid, e.g. DNA, from a blood sample of a subject, and assaying the DNA to determine the individuals' genotype of one or a combination of the marker genes associated with interventional VLC diet response. Other sampling procedures include but are not limited to buccal swabs, saliva, or hair root. In a preferred embodiment, genotyping is performed using a gene array methodology, which can be readily and reliably employed in the screening and evaluation methods according to this invention. A number of gene arrays are commercially available for use by the practitioner, including, but not limited to, static (e.g. photolithographically set), suspended (e.g. soluble arrays), and self assembling (e.g. matrix ordered and deconvoluted). More specifically, the nucleic acid array analysis allows the establishment of a pattern of gene expression variability from multiple genes and facilitates an understanding of the complex interactions that are elicited in an individual in response to diet.
In a specific embodiment, the array consists of several hundred genes and is capable of genotyping hundreds of DNA polymorphisms simultaneously. Candidate genes for use in the arrays of the present invention are identified by various means including, but not limited to, pre-existing clinical databases and DNA repositories, review of the literature, and consultation with clinicians, differential gene expression models, physiological pathways in metabolism, cholesterol and lipid homeostasis, and from previously discovered genetic associations. In a preferred embodiment, the candidate genes are selected from those shown in Table 2.
Each of the foregoing genes, and combinations thereof, are expected to provide useful markers in the practice of the invention. The gene array includes all of the novel marker genes, or a subset of the genes, or unique nucleic acid portions of these genes. The gene array of the invention is useful in discovering new genetic markers of dietary response.
The specific marker will be selected from variants of these genes, or other genes determined to be associated with dietary response. As used herein, the term “variant” refers to mutations, polymorphisms, and insertions and deletions in the nucleic acid sequence of the “wild type” or “normal” gene. Preferred variants in accordance with the invention are single nucleotide polymorphisms (SNPs) which refers to a gene variant differing in the identity of one nucleotide pair from the normal gene. The following table identifies the most promising SNPS, ranked based on the selection criteria of p ≧0.05, for the physiological responses of total cholesterol change (TC), LDL change, HDL change, triglyceride change (TG), log(TG), change in the ratio TC/HDL, change in body mass (BM), change in fat mass (FMS), change in lean body mass (LBM), change in percent fat (PCT), and change in body mass index (BMI).
The SNPs and genes in Table 3 are provided in the nomenclature adopted by the National Center for Biotechnology Information (NCBI) of the National Institute of Health. The sequence data for the SNPs and genes listed in Table 3 is know in the art and is readily available from the NCBI dbSNP and OMIM databases. The coefficients are for the single SNPs and explain the residual change in the indicated response after covariates.
In another embodiment, the present invention provides a screening method to allow the identification of subsets of individuals who have specific genotypes and physiological characteristics and are more or less likely to respond favorably to VLC dieting. For example, a screening method of this embodiment involves obtaining a sample from an individual undergoing testing, such as a blood sample, and employing an assay method, e.g. the array system and newly-identified marker genes and gene variants as described, to evaluate whether the individual has a genotype associated with response to VLC dieting, and in particular change in blood LDL, HDL and triglyceride levels as well as change in body mass index. These individual's, who are identified through screening using the methods of the present invention, would be especially responsive to VLC dieting.
In another embodiment, a diagnostic system containing a support or support material, such as, without limitation, a nylon or nitrocellulose membrane, bead, or plastic film, or glass, or micro- or nano-array, comprising the novel set of genes as described herein, in a form suitable for the practitioner to employ in screening individuals. The diagnostic system can contain the novel gene marker set, or a subset of these genes, on a suitable substrate or micro- or nano-array. In addition, the diagnostic system can optionally contain other materials necessary for carrying out the assay method, including, but not limited to, labeled or unlabeled nucleic acid probes, detection label, buffers, controls, and instructions for use.
The following example demonstrates preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the example which follows represent techniques discovered by the inventors to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
The content of all patents, patent applications, published articles, abstracts, books, reference manuals, sequence accession numbers, as cited herein are hereby incorporated by reference in their entireties to more fully describe the state of the art to which the invention pertains.
Physiogenomics were used as a technique to explore the variability in patient response to VLC diet. Physiogenomics is a medical application of sensitivity analysis [30]. Sensitivity analysis is the study about the relations between the input and the output of a model and the analysis utilizing systems theory, of how variation of the input leads to changes in output quantities. Physiogenomics utilizes as input the variability in genes, measured by single nucleotide polymorphisms (SNP) and determines how the SNP frequency among individuals relates to the variability in physiological characteristics, the output.
The goal of the investigation was to develop physiogenomic markers for predicting anthropomorphic, lipid and endocrine effects of diets by using an informatics platform to analyze data from interventional dietary studies.
Materials and Methods
Patient enrollment. Overweight/obese men and women with a body mass index (BMI) 25 to 35 kg/m2 and age 20 to 60 years were recruited. Initially, prospective volunteers were screened to obtain height, body weight, blood pressure, and medical, nutrition, and activity information. Subjects with Type I or II diabetes, liver or other metabolic/endocrine dysfunction, use of medications/supplements that affect cholesterol (e.g., statins, fish oil, nicotinic acid) or glucose (e.g., metformin), weight reducing or VLC diets, and blood pressure >160/95 mmHg were excluded. Previously active subjects were required to maintain their current exercise routines during the entire experimental period (verified by activity records) and sedentary individuals were not allowed to start an exercise program in order to control for possible confounding effects on the dependent variables [31].
Low attrition was credited to meticulous attention to dietary protocols with group and individualized counseling for subjects. Prospective volunteers were recruited from the local area around the University of Connecticut (Storrs, Conn.) including faculty, staff, graduate and undergraduate students at the main campus. Volunteer screening was performed without restriction in regard to gender, race, and socioeconomic status.
Diet Interventions. The interventions were VLC diets similar to diets investigated in several prior studies [23-29]. VLC diets have been studied extensively in terms of their effects on weight loss, body composition, blood lipids, and hormones and were selected over low-fat diets based on clinical work comparing VLC and low-fat diets [21, 23-29, 32-39].
The diet was hypocaloric (−500 kcal/day) based on estimated caloric needs to maintain body weight. A daily multi-vitamin/mineral complex at levels ≦100% of the RDA was given to subjects to ensure adequate micronutrient status. Subjects were required to report to the laboratory each week for group meetings that involved education and assessment of weight, dietary compliance, and ketone monitoring.
The main goal of the VLC diet was to restrict carbohydrate to <10% of total energy. Customized diabetic exchange lists were used to ensure a constant energy and balance of protein (˜25% of energy), fat (˜65% of energy), and carbohydrate (˜10% of energy) throughout the day. There were no restrictions on the type of fat from saturated and unsaturated sources or cholesterol levels. Foods commonly consumed included beef (e.g., hamburger, steak), poultry (e.g., chicken, turkey), fish, oils, nuts/seeds, peanut butter, moderate amounts of vegetables, salads with low-carbohydrate dressing, moderate amounts of cheese, eggs, protein powder, and water or low-carbohydrate diet drinks. Excellent compliance was attained as measured by production of urinary and blood ketones, indicating a high degree of dietary compliance in terms of carbohydrate restriction. Subjects tested and recorded their urine ketones daily using reagent strips (Bayer Corporation, Elkhart, Ind.).
Body Mass and Body Composition. Body mass and body composition were measured in the morning after an overnight fast. Body mass was recorded to the nearest 100 g on a calibrated digital scale with subjects either nude or wearing only underwear. Whole body and regional body composition were assessed using a state-of-the-art fan-beam DXA (Prodigy™, Lunar Corporation, Madison, Wis.). Analyses were performed by the same blinded technician. Regional analysis of the abdomen was assessed by placing a box between L1 and L4 using commercial software (enCORE version 6.00.270). This abdominal region of interest has been shown to be a highly reliable and accurate determinant of abdominal obesity compared to multi-slice computed tomography [40]. It has previously been found that, compared to a LF diet, a VLC diet results in preferential loss of fat in this abdominal region [25].
Fasting Blood Collection. Blood lipids, insulin and glucose were determined by standard methods reported by Volek et al. [26].
Clinical Database. As a result of the above measurements in the recruited subjects, a clinical database was created. This work has generally shown that VLC diets have a favorable effect on biomarkers for cardiovascular disease. A VLC diet reduces fasting triacylglycerols, increases HDL cholesterol, and promotes a more favorable LDL subclass pattern. Furthermore, a VLC diet results in significant reductions in the triacylglycerol response to a fat-rich meal (i.e., decreased postprandial lipemia).
A great deal of variability in response to diet is evident among subjects. For example, individual variability in LDL-C, HDL-C and triglycerides responses to a VLC diet were been studied. For LDL-C, the response was evenly split (half increase, half decrease) such that the mean response was zero. For HDL-C, there was an average improvement, but with significant variability. For triglycerides, almost all subjects exhibited a decrease but there was large variability in the magnitude of the reduction. Similar responses showing variability exist for other parameters such as weight loss, fat loss, insulin, and LDL size in response to VLC and LF diets. These results show the variability in response to the same diet and provide strong evidence for the need to individualize the diet prescription. Of note is the fact that two components of metabolic syndrome, HDL and TG respond favorably on average to carbohydrate-restricted diets.
A basic question pertains to which particular variables might explain this variability in response. To this end, the effect of various baseline covariates on dietary response were examined. Baseline LDL, HDL and TG, weight, percentage body fat and gender were determined. These covariates explain 30% of the variability in LDL response, and 50% of the variability in HDL and TG response. To improve the precision of the prediction, it was hypothesized that a unique combination of genetic, physiological, and demographic information (a PhysioType™) can precisely predict the response.
Sample procurement and DNA isolation. DNA samples were quantified by the fluorescent PicoGreen assay and diluted or concentrated to a standard 50 ng/uL. Each DNA sample was examined according to a standard PCR analytical panel to determine its suitability for the Illumina BeadStation. It has been observed that in many clinical studies the range of DNA quality, concentration and suitability for amplification is quite variable and reflects legacy issues related to whole blood extraction and banking over several studies. For example, depending on the extraction technique, hemoglobin remnants and heme itself may interfere with DNA amplification by inhibition of the DNA polymerase. The Whole Genome Amplification (WGA) technology is particularly useful in this study. WGA allows trace DNA remnants in blood serum to be amplified and provide DNA template for subsequent PCR analysis via multiple displacement amplification [41]. WGA has successfully been used to amplify microgram amounts of genomic DNA (gDNA) from the low number of gDNA copies present in serum. Genomic DNA was isolated from whole blood using standard DNA extraction and purification methods and reagents (Qiagen, San Diego, Calif.). WGA was performed on stored and cryo-preserved plasma samples where whole blood could not be obtained. The quality of DNA was determined by amplification of two loci highly sensitive to degradation using TaqMan (Applied Biosystems) [42]. The concentration of double stranded DNA was determined using the PicoGreen assay (Molecular Probes, Eugene, Oreg.). gDNA was adjusted to 50 ng/μL, aliquoted into 96-well plates and stored at −86° C.
Development of Nutrition Gene Array
In order to obtain multiple genotypes for the recruited subjects in the diet study, a Nutrition Gene Array was be developed, consisting of nutritionally-relevant genetic markers associated with lipid metabolism, metabolic/endocrine function, vascular inflammation endothelial dysfunction and obesity.
Selection of Gene Markers. To determine the influence of genetic factors on VLC dieting, genes associated with dyslipidemia, metabolic and endocrine function, endothelial dysfunction, and chronic vascular inflammation provided the initial focus. Additional genes relevant to nutrition were also included. The physiogenomic rationale for the selection of these genes is as follows. First, these genes are representatives of the various physiological pathways and networks. The use of these genes in physiogenomics is in sharp contrast to gene discovery efforts based on gene expression profiling or disease mapping. It will be recognized that that this list of genes will miss some known key genes, and will certainly lack those genes not discovered so far or not identified yet as relevant. Second, as many of the physiological networks have built in redundancy, feedback, and amplification, it is assumed that the elucidation of every single gene in a pathway may be unnecessary for physiogenomics as long as a representative gene of a given circuit is included.
Physiogenomics posits the gene representation question in the following logic: representative genes are selected based on various functional criteria, the genes are assembled in a panel, and the panel is then used as a substrate to draw inferences on physiological association based on genetics. Through clinical research the predictive power of the panel is ascertained. The underlying hypothesis is that the genes in the panel together explain a useful fraction of the variability in response among individuals. If the answer is affirmative, the hypothesis is accepted, and the panel is used. If the answer is inconclusive, the roster of genes is modified until the panel's predictive level is clinically useful.
The combination of several disease factors will lead to insulin resistance, MetSyn, atherosclerosis, hypertension, thrombosis and eventually cardiovascular events. Many of the selected genes have functions in more than just one category. The selection criteria considered each gene's known physiological and pathophysiological importance in respect to the MetSyn and CVD [43]. The existence of functionally active mutations (OMIM Database, 2000) [44] was considered but not required for selection. A subset of the selected genes has functional genotypes that represent risk factors for the development of atherosclerosis (e.g., apoE), hypertension (e.g., angiotensinogen), obesity (e.g., leptin), or insulin resistance (e.g., insulin receptor). It was expected that a stronger correlation between the risk of an individual to develop CVD and the benefits of diet would be obtained when using an array of genes as markers compared to a small set of clinical markers, such as lipid profiles. The genes of the Nutrition Array are discussed below.
a. Lipid Metabolism and Dyslipidemia. Along with their essential role in energy homeostasis, organ physiology, and cellular biology, lipids are linked to many pathological processes. Lipoproteins function as carrier molecules for lipids, cholesterol, and cholesterol esters. Apolipoproteins, the structural components of these lipoprotein transport molecules, are being studied in CVD for their role in atherosclerotic plaques development [45]. They assist in the transport of cholesterol from bodily tissues to the liver for excretion (APOA1) and in the transport and conversion of TGs (APOB). Apoliporoteins are also involved in the metabolism of TG-rich lipoproteins (APOE) and represent cofactors for lipid modifying proteins (APOA1 for lecithin:cholesterol acyltransferase {LCAT}, APOC for lipoprotein lipase {LPL}). Enzymes participating in cholesterol synthesis, uptake and modification represent valuable targets for an anti-atherosclerotic approach, documented by the current development of new lipid-lowering agents, such as inhibitors of CETP and ACAT1 and PPAR agonists [46]. LPL plays an important role in VLDL fatty acid release and its subsequent conversion to LDL. Hormone-sensitive lipase (HSL) is a major determinant of fatty acid mobilization. It plays a pivotal role in lipid metabolism, overall energy homeostasis, and fatty acid signaling. Two proteins have been selected as part of the free fatty acid metabolisms [47]. Carnitine palmitoyltransferase (CPT) facilitates mitochondrial fatty acid oxidation and deficiencies in CPT are common disorders. The intestinal fatty acid binding protein (FABP2) gene is of interest since it has been proposed as a candidate gene for diabetes. Cells are endowed with 2 acetyl-coenzyme A carboxylase (ACC) systems to control fatty acid amounts and ACCB is believed to control mitochondrial fatty acid oxidation.
b. Metabolic or Endocrine Function. Human evolution selected genes that mediate the efficient conversion of nutrients into fat as an effective storage form. Ingestion of a high carbohydrate diet results in increased insulin levels. Among the regulatory enzymes of glycolysis and lipogenesis that become activated [48], pyruvate kinase (PK), phosphofructokinase (PFK), acetyl CoA carboxylase (ACC), and fatty acid synthase (FAS) were selected for genomic analysis. The transcriptional regulation of those genes is facilitated by the carbohydrate response element-binding protein (CHREBP) [49]. Similarly, insulin stimulated lipogenesis is mediated through a transcription factor called sterol responsive element-binding protein (SREBP). It controls genes involved in cholesterol uptake and biosynthesis. ATP-binding cassette (ABC) transporters modulate cholesterol and lipoprotein metabolism [45]. ABCG5 and ABCC8 play an important role in limiting intestinal absorption and promoting biliary excretion of neutral sterols. PPARs are members of the nuclear receptor superfamily [50]. Two members, PPARA and PPARG, regulate fatty acid catabolism, adipocyte differentiation, lipid storage, and glucose homeostasis. PPAR agonists have all been reported to exhibit anti-inflammatory activity in macrophages and endothelial cells. Glycogen synthase (GYS) activity is thought to be rate-limiting in the disposal of glucose as muscle glycogen [51]. Phosphoenolpyruvate carboxykinase (PEPCK) is considered to be the first step in gluconeogensis. The synthesis of the soluble isoform (PEPCK1) is regulated by gene transcription and the rate of mRNA turnover can be induced by starvation and reduced through a high carbohydrate diet. Adiponectin and resistin are secretory products of adipose tissue [52]. Plasma adiponectin is reduced in MetSyn and in patients with ischemic heart disease. Hypoadiponectinemia may contribute to insulin resistance and accelerated atherogenesis in obesity. UCP2 and UCP3 play a role in reducing reactive oxygen species formation. UCP3 could also facilitate lipid oxidation by acting as a free fatty acid anion transporter in a variety of physiological states. UCP3 represents an interesting target shifting energy expenditure towards heat dissipation.
c. Vascular Inflammation. In recent years, it has become apparent that low-grade vascular inflammation plays a key role in all stages of the atherosclerotic process [53]. Several blood markers indicative of endothelial dysfunction and vascular inflammation have been found to be associated with future cardiovascular risk including proinflammatory cytokines, such as IL-6 and TNF-α and the acute phase reactant CRP [54]. During early atherosclerotic lesion development the activated endothelium releases cellular adhesion molecules [53] that lead to the adherence and transendothelial migration of monocytes (through P-selectin and E-selectin) and leukocytes (through ICAM1 and VCAM1). Inflammatory cytokines such as IL-1, TNF-α, interferon-γ, or oxidized LDL receptor modulate the expression of E-selectin, ICAM1, and VCAM1. Cytokine stimulated endothelial cells also produce MCP-1 and IL-6, which further amplify the inflammatory cascade. TNF-α is produced by a variety of cells. TNF-α stimulates, along with interferon-γ and IL-1, the production of IL-6 by smooth muscle cells. IL-6 gene transcripts are expressed in human atheromatous lesions, and IL-6 is the main hepatic stimulus for CRP production. CRP may contribute directly to the proinflammatory state by stimulating the release of inflammatory cytokines such as IL-1β, IL-6, and TNF-α by endothelial cells.
d. Endothelial Dysfunction. The endothelium regulates vascular tone through the release of vasoactive substances [55]. The two most important vasocontrictors are endothelin and angiotensin II. Angiotensin II stimulates a variety of pro-atherogenic responses, such as expression of adhesion molecules (e.g., ICAM1, VCAM1), platelet aggregation, thrombosis (through PAI1 expression), cell migration, and expression of TGF-1β. We are therefore interested in genetic modifications of the AGT, the precursor of angiotensin II. AGT is processed by the renin-angiotensin (ACE) system. The most important vasodilator is NO, generated by the endothelial nitric oxide synthase (NOS3) [56]. NO is also vascular protective and inhibits inflammation, oxidation, vascular smooth muscle cell proliferation, and migration. Endothelial dysfunction is caused by a damaged endothelium with impaired NO release. Reduction in bioavailable NO can be a result of altered NOS3 expression or activity, but is often due to a decrease in NO half-life. The reaction of NO with superoxide is extremely fast and efficient. Superoxide dismutase protects NO and endothelial function [57]. The link between endothelial dysfunction and traditional risk factors for CVD, including diabetes, hypercholesterolemia, and hypertension, supports the effort to include related genes [58].
e. Obesity. Genes involved in the regulation of energy metabolism, appetite control or autocrine-paracrine signalling by adipocytes are all plausible candidates for genes that are involved in common obesity. Adiponectin is a hormone that regulates energy homeostasis and glucose and lipid metabolism. It is expressed by differentiated adipocytes as a 33-kD protein that is also detectable in serum [59]. Leptin is a protein that plays a critical role in the regulation of body weight by inhibiting food intake and stimulating energy expenditure. Multiple regression analysis has shown that adiposity, gender, and insulinemia were significant determinants of leptin concentration, explaining 42%, 28%, and 2% of its variance, respectively [60]. Uncoupling protein-1 (UCP-1) diverts energy from ATP synthesis to thermogenesis in the mitochondria of brown adipose by catalyzing a regulated leak of protons across the inner membrane. Manipulation of thermogenesis could be an effective strategy against obesity [61]. The solute carrier family 6 member 14 gene (SLC6A14) gene encodes a sodium- and chloride-dependent transporter of neutral and cationic amino acids [62] that has a high affinity for the non-polar amino acid tryptophan. In the brain, the enzyme tryptophan hydroxylase converts tryptophan into serotonin. This neurotransmitter is known to be strongly involved with the central signaling of satiety by mechanisms that include effects on downstream effector neurons in the hypothalamus [63]. Therefore, a possible hypothesis is that a reduction in the concentration of serotonin owing to reduced tryptophan transport might be the link with lower SLC6A14 activity, and thereby increasing susceptibility to obesity by reducing satiety [64].
Potential associations to diet using the Nutrition Gene array. Various SNPs associated with the observation of lipid level and BMI changes in patients undergoing a low-carbohydrate diet were screened. The endpoints analyzed were the blood levels of LDL, HDL and triglycerides (TG) and BMI. The physiogenomic model was developed using the following procedure: 1) Establish a baseline model using only the demographic and clinical variables, 2) Screen for associated genetic markers by testing each SNP against the unexplained residual of the baseline model, and 3) Establish a revised model incorporating the significant associations from the SNP screen. All models are simple linear regression models, but other well-known statistical methods are contemplated to be useful.
The means of the baseline variables broken down by demographic factors are shown in Table 4.
*Gender (Male, female); Age; Ethnicity (Self described); Order (First, second. If second, patient was on a low fat diet previously); Length (Number of weeks of diet); Tc.pre (Total cholesterol (mg/dL)); Ldl.pre (LDL cholesterol (mg/dL)); Hdl.pre (HDL cholesterol (mg/dL));
In the SNP screen (step 2), the p-values for each SNP were obtained by adding the SNP to the baseline model and comparing the resulting model improvement with up to 10,000 simulated model improvements using the same data set, but with the genotype data randomly permuted to remove any true association. This method produces a p-value that is a direct, unbiased, and model-free estimate of the probability of finding a model as good as the one tested when the null hypothesis of no association is true. All SNPs with a screening p-value of better than 0.003 were selected to be included in the physiogenomic model (step 3).
Data Analysis. Covariates were analyzed using multiple linear regression and the stepwise procedure. An extended linear model was constructed including the significant covariate and the SNP genotype. SNP genotype was coded quantitatively as a numerical variable indicating the number of minor alleles: 0 for major homozygotes, 1 for heterozygotes, and 2 for minor homozygotes. The F-statistic p-value for the SNP variable was used to evaluate the significance of association. Table 3 lists all SNPs that were tested and their association p-values. The validity of the p-values were tested by performance of an independent calculation of the p-values using permutation testing. To account for the multiple testing of multiple SNPs, adjusted p-values were calculated using Benjamini and Hochbergs false discovery rate (FDR) procedure [65,66,67]. In addition, the power for detecting an association based on the Bonferroni multiple comparison adjustment was evaluated. For each SNP, the effect size in standard deviations that was necessary for detection of an association at a power of 80% (20% false negative rate) was calculated using the formula:
where α was the desired false positive rate (α=0.05), β the false negative rate (β=1-Power=0.2), c the number of SNPs, z a standard normal deviate, N the number of subjects, f the carrier proportion, and Δ the difference in change in response between carriers and non-carriers expressed relative to the standard deviation [68].
LOESS representation. A locally smoothed function of the SNP frequency as it varies with each response was used to visually represent the nature of an association. LOESS (LOcally wEighted Scatter plot Smooth) is a method to smooth data using a locally weighted linear regression [69, 70]. At each point in the LOESS curve, a quadratic polynomial was fitted to the data in the vicinity of that point. The data were weighted such that they contributed less if they were further away, according to the following tricubic function where x was the abscissa of the point to be estimated, the xi were the data points in the vicinity, and d(x) was the maximum distance of x to the xi.
The distribution of change in HDL values in the study population was approximately normal (
The overall distribution of change in HDL values is shown in
a. Data analysis The objective of the statistical analysis is to find a set of physiogenomic factors that together provide a way of predicting the outcome of interest. The association of an individual factor with the outcome may not have sufficient discrimination ability to provide the necessary sensitivity and specificity, but by combining the effect of several such factors the objective is reached. Increased sensitivity and specificity for the cumulative effect on prediction can be achieved through the use of common factors that are statistically independent. The assumptions on which these calculations are based are (a) the factors are independent of each other, (b) the association between each factor and the outcome can be summarized by a modest odds ratio of 1.7, and (c) the prevalence of each physiogenomic factor in the population is 50% and independent of the others. Clearly, the prediction becomes even stronger if the association with the response is stronger or one finds additional predictors. However, factors that are less useful for these types of prediction are those that are less common in the population, or collinear with factors that have already been identified in the prediction model.
b. Model Building. Discovery of markers affecting response to diet. A model was developed for the purpose of predicting a given response (Y) to a diet; including change in anthropomorphic, lipid, inflammatory, endothelial and endocrine effects. A linear model for subjects in the diet group was be used in which the response of interest can be expressed as follows:
where Mi are the dummy marker variables indicating the presence of specified genotypes and Dj are demographic and clinical covariates. The model parameters that are to be estimated from the data are R0, αi and βj. This model employs standard regression techniques that enable the systematic search for the best predictors. S-plus provides very good support for algorithms that provide these estimates for the initial linear regression models, as well other generalized linear models that may be used when the error distribution is not normal. For continuous variables, generalized additive models, including cubic splines in order to appropriately assess the form for the dose-response relationship may also be considered [71,72].
In addition to optimizing the parameters, model refinement is performed. The first phase of the regression analysis will consist of considering a set of simplified models by eliminating each variable in turn and re-optimizing the likelihood function. The ratio between the two maximum likelihoods of the original vs. the simplified model then provides a significance measure for the contribution of each variable to the model.
The association between each physiogenomic factor and the outcome is calculated using logistic regression models, controlling for the other factors that have been found to be relevant. The magnitude of these associations are measured with the odds ratio and the corresponding 95% confidence interval, and statistical significance assessed using a likelihood ratio test. Multivariate analyses is used which includes all factors that have been found to be important based on univariate analyses.
Because the number of possible comparisons can become very large in analyses that evaluate the combined effects of two or more genes, the results include a random permutation test for the null hypothesis of no effect for two through five combinations of genes. This is accomplished by randomly assigning the outcome to each individual in the study, which is implied by the null distribution of no genetic effect, and estimating the test statistic that corresponds to the null hypothesis of the gene combination effect. Repeating this process 1000 times will provide an empirical estimate of the distribution for the test statistic, and hence a p-value that takes into account the process that gave rise to the multiple comparisons. In addition, hierarchical regression analysis is considered to generate estimates incorporating prior information about the biological activity of the gene variants. In this type of analysis, multiple genotypes and other risk factors can be considered simultaneously as a set, and estimates will be adjusted based on prior information and the observed covariance, theoretically improving the accuracy and precision of effect estimates [73].
c. Power calculations. The data available for study in this project are for 86 subjects. The power available for detecting an odds ratio (OR) of a specified size for a particular allele was determined on the basis of a significance test on the corresponding difference in proportions using a 5% level of significance. The approach for calculating power involved the adaptation of the method given by Rosner [68]. The SNPs that are explored in this research are not so common as to have prevalence of more than 35%, but rather in the range of 10-15%. Therefore, it is apparent that the study has at least 80% power to detect odds ratios in the range of 1.6-1.8, which are modest effects.
d. Model validation. A cross-validation approach is used to evaluate the performance of models by separating the data used for parameterization (training set) from the data used for testing (test set). The approach randomly divides the population into the training set, which will comprise 80% of the subjects, and the remaining 20% will be the test set. The algorithmic approach is used for finding a model that can be used for prediction of dietary response will occur in a subject using the data in the training set. This prediction equation is then used to prepare an ROC curve that provides an independent estimate of the relationship between sensitivity and specificity for the prediction model.
e. Patient Physiotype. Table 5 shows a collection of physiotypes for the outcomes LDL, HDL, TG, and BMI. Each physiotype in this particular embodiment consists of a selection of markers, and intercept value, and a coefficient for each marker. For example, the LDL physiotype consists of the marker rs1018381 and rs4804103, and the coefficients −1.28 and −0.83, respectively. The predicted LDL response for a given individual is then given by the formula
where C is the intercept, the ci are the coefficients and the gi are the genotypes, coded 0 for the wild type allele homozygote, 1 for the heterozygote, and 2 for the variant allele homozygote as listed for an example individual in the DNA type column. For example, the LDL response for the individual specified in Table 5 would be predicted as −7.48, since the genotypes for both markers are zero and the intercept is −7.48.
In this embodiment, the physiotype consists of a linear regression model. In other embodiments, the physiotype might consist of a generalized linear regression model, a structural equation model, a Baysian probability network, or any other modeling tool known to the practitioner of the art of statistics.
The patient's physiotype may be expressed in a convenient format for the practitioner's assessment of a patient's likely response to diet. The patient's physiotype corresponding to the genotype of Table 5 is shown in
Number | Date | Country | |
---|---|---|---|
60760652 | Jan 2006 | US |