1. Technical Field
This document relates to methods and materials involved in predicting Parkinson's disease.
2. Background Information
Complex diseases occur commonly in the population and are a major source of disability and death in societies worldwide. They are thought to arise from multiple predisposing factors, both genetic and non-genetic, and joint effects of those factors are thought to be of key importance. Parkinson's disease (PD) serves as an example of a complex disease. Other examples include Alzheimer's disease, diabetes mellitus, nicotine and alcohol dependence, and many cancers. While major inroads have been made in identifying the genetic causes of rare Mendelian disorders, little progress has been made in the discovery of gene variations that predispose to complex diseases. The single gene variants that have been shown to associate reproducibly with complex diseases typically have small effect sizes.
This document relates to methods and materials involved in predicting PD. For example, this document provides methods for assessing the genotype of a human to determine whether or not the human has an increased susceptibility of developing PD. In addition, this document provides diagnostic devices containing probe or primer collections designed to detect the genotype of a human, thereby providing the ability to assess the human for increased susceptibility to develop PD.
As described herein, common variations (e.g., polymorphisms) within genes that encode polypeptides within an axon guidance pathway can be strong predictors of phenotypes such as PD susceptibility and age at onset of PD. For example, the models provided herein can be used to predict PD susceptibility (P=4.64×10−38), survival free of PD (P=5.43×10−48), and age at PD onset (P=1.68×10−51). As demonstrated herein, polymorphisms in axon guidance pathway genes accounted for nearly 70% of the age at onset variance of PD, and people with high versus low model scores differed in survival free of PD by more than 20 years. The results provided herein help demonstrate that complex genetics can be a paradigm for common, late onset diseases in man.
In general, one aspect of this document features a method for assessing Parkinson's disease susceptibility. The method comprises, or consists essentially of, determining whether or not a human contains an axon guidance pathway genotype predisposing the human to develop Parkinson's disease. The human can be any age (e.g., 40 years of age or older). The determining step can comprise sequencing nucleic acid from the human. The determining step can comprise using a probe to determine the presence or absence of a polymorphism in the human. The genotype can comprise at least two of the polymorphisms of Table 1.
In another aspect, this document features a method for predicting age of Parkinson's disease onset in a human. The method comprises, or consists essentially of:
(a) determining the presence or absence of polymorphisms in a set of axon guidance pathway genes of the human to obtain information about the axon guidance pathway genotype of the human, and
(b) calculating an age of Parkinson's disease onset for the human based on the information. The human can be any age (e.g., 20 years of age or older). The determining step can comprise sequencing nucleic acid from the human. The determining step can comprise using a probe to determine the presence or absence of a polymorphism in the human. The genotype can comprise at least two of the polymorphisms of Table 3.
In another aspect, this document features a method for assessing a human for the likelihood of survival free of Parkinson's disease. The method comprises, or consists essentially of, determining whether or not the human contains an axon guidance pathway genotype indicative of survival free of Parkinson's disease. The human can be any age (e.g., 40 years of age or older). The determining step can comprise sequencing nucleic acid from the human. The determining step can comprise using a probe to determine the presence or absence of a polymorphism in the human. The genotype can comprise at least two of the polymorphisms of Table 2.
In another aspect, this document features a diagnostic device for assessing a human's predisposition to develop Parkinson's disease. The device comprises, or consists essentially of, an array of nucleic acid probes, wherein each nucleic acid probe of the array is located at an identifiable location of the device, wherein at least 50 percent of the probes of the device are selected from the group of probes consisting of probes having a sequence capable of hybridizing to a polymorphic nucleic acid set forth in Table 1, 2, or 3. The array can comprise at least 25 nucleic acid probes. The array can comprise at least 50 nucleic acid probes. The array can comprise at least 100 nucleic acid probes. At least 75 percent of the probes of the device are selected from the group. At least 90 percent of the probes of the device are selected from the group. At least 50 percent of the probes of the device are selected from the group of probes consisting of probes having a sequence capable of hybridizing to a polymorphic nucleic acid set forth in Table 1. At least 50 percent of the probes of the device are selected from the group of probes consisting of probes having a sequence capable of hybridizing to a polymorphic nucleic acid set forth in Table 2. At least 50 percent of the probes of the device are selected from the group of probes consisting of probes having a sequence capable of hybridizing to a polymorphic nucleic acid set forth in Table 3.
In another aspect, this document features a method for assessing Parkinson's disease susceptibility. The method comprises, or consists essentially of, determining whether or not a human contains a polymorphism in PPP3CA, MRAS, PLXNA2, RAC2, SEMA5A, NFATC2, ROBO1, ROBO2, EPHB1, NFATC4, NTNG1, or EFNA5 nucleic acid (or a polymorphism in any of the brain expressed axon guidance pathway genes of Table 20 or as annotated in KEGG (World Wide Web at “genome.jp/kegg/pathway/hsa/hsa04360.html”)), wherein the presence of the polymorphism indicates that the human is susceptible to developing Parkinson's disease. The method can comprise diagnosing the human as being susceptible to develop Parkinson's disease if the human contains the polymorphism, and diagnosing the human as not being susceptible to develop Parkinson's disease if the human lacks the polymorphism. The method can comprise recording that the human as being susceptible to develop Parkinson's disease if the human contains the polymorphism, and recording that the human as not being susceptible to develop Parkinson's disease if the human lacks the polymorphism.
In another aspect, this document features a method for assessing Parkinson's disease susceptibility. The method comprises using a model to determine whether or not a human contains an axon guidance pathway genotype predisposing the human to develop Parkinson's disease, wherein the model comprises a model concordance of at least 0.6 (e.g., greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, between 0.6 and 0.99, between 0.65 and 0.99, or between 0.65 and 0.95), an r-squared of at least 0.6 (e.g., greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, between 0.6 and 0.99, between 0.65 and 0.99, or between 0.65 and 0.95), or a sensitivity and specificity of at least 0.6 (e.g., greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, between 0.6 and 0.99, between 0.65 and 0.99, or between 0.65 and 0.95).
In another aspect, this document features a method for diagnosing a brain disorder or assessing brain disorder susceptibility. The method comprises using a model to determine whether or not a human contains an axon guidance pathway genotype diagnositic for the brain disorder or predisposing the human to develop the brain disorder, wherein the model comprises a model concordance of at least 0.6 (e.g., greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, between 0.6 and 0.99, between 0.65 and 0.99, or between 0.65 and 0.95), an r-squared of at least 0.6 (e.g., greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, between 0.6 and 0.99, between 0.65 and 0.99, or between 0.65 and 0.95), or a sensitivity and specificity of at least 0.6 (e.g., greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, between 0.6 and 0.99, between 0.65 and 0.99, or between 0.65 and 0.95). The brain disorder can be selected from the group consisting of Parkinson's disease, Alzheimer's disease, ALS, Tourette's syndrome, dyslexia, autism, mental retardation, epilepsy, stuttering, schizophrenia, addiction, anxiety, depression, obsessive-compulsive disorder, ADHD, MS, brain tumors, brain injury, and spinal cord injury.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used to practice the invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
This document provides methods and materials related to assessing PD. For example, this document provides methods for assessing the genotype of a human to determine whether or not the human has an increased susceptibility of developing PD. In addition, this document provides diagnostic devices containing probe or primer collections designed to detect the genotype of a human, thereby providing the ability to assess the human for increased susceptibility to develop PD.
In addition to PD, the methods and materials provided herein can be used to assess other brain disorders including, without limitation, Alzheimer's disease, ALS, Tourette's syndrome, dyslexia, autism, mental retardation, epilepsy, stuttering, schizophrenia, addiction, anxiety, depression, obsessive-compulsive disorder, ADHD, MS, brain tumors, brain injury, and spinal cord injury. For example, the methods and materials provided herein can be used to diagnose a brain disorder (e.g., PD) or to assess brain disorder susceptibility in a mammal.
Any type of sample containing a human's nucleic acid can be used to determine the human's genotype. For example, cells such as white blood cells or skin cells can be collected and assessed for the presence or absence of polymorphisms such as those described in Table 1, 2, 3, 5, 6, 7, 13, 14, 15, 17, 18, or 19. In some cases, a polymorphism located near (e.g., a polymorphism considered linked to another polymorphism) one or more of the polymorphisms set forth in Table 1, 2, 3, 5, 6, 7, 13, 14, 15, 17, 18, or 19 can be used to determine the human's genotype. In some cases, a biopsy (e.g., punch biopsy, aspiration biopsy, excision biopsy, needle biopsy, or shave biopsy), tissue section, lymph fluid sample, blood sample, or synovial fluid sample can be used.
Any method can be used to determine whether or not a particular polymorphism or collection of polymorphisms are present within a human's genome. For example, sequencing techniques or hybridization techniques (e.g., chip hybridization techniques) can be performed to detect polymorphisms set forth in Table 1, 2, 3, 5, 6, 7, 13, 14, 15, 17, 18, or 19. Methods for chip hybridization assays include, without limitation, methods that involve using nucleic acid arrays having probes designed to distinguish between different polymorphic sequences via hybridization. Such methods can be used to determine simultaneously the presence or absence of multiple polymorphisms.
The methods and materials provided herein can be used at any time during a human's life to determine whether or not the human is susceptible to developing a brain disorder (e.g., PD or ALS). In some cases, the methods and materials provided herein can be used to assess a human embryo, fetus, newborn, infant, child, or adult. For example, a sample can be obtained from a human at least 20, 30, 40, 50, 60, or more years of age. Once obtained, the sample can be evaluated to determine the human's genotype (e.g., the human's genotype regarding the SNPs listed in Table 1, 2, 3, 5, 6, 7, 13, 14, 15, 17, 18, or 19).
This description also provides nucleic acid arrays. The arrays provided herein can be two-dimensional arrays, and can contain at least 10 different nucleic acid molecules (e.g., at least 20, at least 30, at least 50, at least 100, or at least 200 different nucleic acid molecules). Each nucleic acid molecule can have any length. For example, each nucleic acid molecule can be between 10 and 250 nucleotides (e.g., between 12 and 200, 14 and 175, 15 and 150, 16 and 125, 18 and 100, 20 and 75, or 25 and 50 nucleotides) in length. In addition, each nucleic acid molecule can have any sequence. For example, the nucleic acid molecules of the arrays provided herein can contain sequences capable of detecting the polymorphisms set forth in Table 1, 2, 3, 5, 6, 7, 13, 14, 15, 17, 18, or 19. Typically, at least 25% (e.g., at least 30%, at least 40%, at least 50%, at least 60%, at least 75%, at least 80%, at least 90%, at least 95%, or 100%) of the nucleic acid molecules of an array provided herein can contain a sequence capable of detecting a polymorphism set forth in Table 1, 2, 3, 5, 6, 7, 13, 14, 15, 17, 18, or 19.
The nucleic acid arrays provided herein can contain nucleic acid molecules attached to any suitable surface (e.g., plastic or glass). In addition, any method can be use to make a nucleic acid array. For example, spotting techniques and in situ synthesis techniques can be used to make nucleic acid arrays. Further, the methods disclosed in U.S. Pat. Nos. 5,744,305 and 5,143,854 can be used to make nucleic acid arrays.
The invention will be further described in the following examples, which do not limit the scope of the invention described in the claims.
The following was performed to determine whether or not common variations in genes that encode polypeptides involved in the axon guidance pathway predispose to humans to PD. First, the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa, Trends Genet. 13:375 (1997); Kanehisa and Goto, Nucleic Acids Res. 28:27 (2000); and Kanehisa et al., Nucleic Acids Res. 34:D354 (2006)) was consulted. The KEGG pathway database is a bioinformatics resource that provides wiring diagrams of molecular interactions, reactions, and relations. There are several hundred pathways in KEGG related to Homo sapiens and diseases. This includes a detailed summary of the axon guidance pathway (World Wide Web at “genome.jp/dbget-bin/www_bget?path:hsa04360”). All of the genes that encoded polypeptides within the pathway were identified via Entrez Gene (World Wide Web at ncbi.nlm.nih.gov/entrez/query.fcgi?db=Gene). The UniGene database (World Wide Web at “ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene”) was used to determine which of the genes were expressed in human brain. The whole-genome association dataset for PD (Maraganore et al., Am. J. Hum. Genet. 77:685 (2005)) was used to identify those SNPs that were genotyped in brain-expressed, axon guidance pathway genes. Briefly, the whole-genome association dataset contains 198,345 genomic SNPs that were individually genotyped, uniformly spaced (1 per 12 kb average gap distance), and informative in 443 sibling pairs discordant for PD. This included 1,460 SNPs within 117 axon guidance pathway genes expressed in the brain.
All statistical tests were two-tailed, and considered significant at the conventional alpha level of 0.05. All statistical analyses were performed in SAS v. 9.1 (SAS Institute Inc., Cary, N.C.) or S-Plus v. 7 (Insightful Corp., Seattle, Wash.). Two phenotypes and three outcomes were considered of interest. For the first phenotype, PD susceptibility, the goal was to identify SNPs that predicted risk of PD (first outcome of interest). For the second phenotype, PD age at onset, the goal was to identify SNPs that predicted survival free of PD (second outcome of interest) or the reported age at onset of PD (third outcome of interest). A goal was to identify joint action models of SNPs from the axon guidance pathway that predicted each of the three outcomes.
For the first outcome, conditional logistic regressions stratified on sibship were used to examine associations of the SNPs with PD susceptibility while adjusting for age and gender (Breslow and Day, IARC Sci. Publ. 32:5 (1980)). For each SNP, odds ratios (ORs), 95% confidence intervals, and P values were calculated. Goodness-of-fit was assessed through measuring concordance and visually through histograms of predicted probabilities (Harrell et al., Stat. Med. 15:361 (1996)). The overall odds ratios were estimated by categorizing the predicted probability of PD from the model into four groups (<0.25, 0.25-0.50, 0.50-0.75, and >0.75), and then calculating the odds ratios for each group relative to the <0.25 group. A likelihood ratio test was used to assess the significance of the overall model, and a 95% bias-corrected bootstrap CI was calculated for the associated p value using 10,000 re-samples.
For the second outcome, Cox proportional hazards models were used to test for associations of the SNPs with survival free of PD (Cox, J. R. Stat. Soc. [Ser. B] 34:187 (1972)). For each SNP, hazard ratios (HRs), 95% confidence intervals, and P values were calculated. Concordance was again calculated for the proportional hazards models, and Kaplan-Meier plots of categorized scores predicting risk of PD were generated to provide visual gauges for goodness-of-fit (Kaplan and Meier, J. Am. Stat. Assoc. 53:457 (1958)). Hazard ratios were calculated for risk groups categorized at the quartiles, using the lowest risk group as reference. A likelihood ratio test was used to assess the significance of the overall model, and a 95% bias-corrected bootstrap CI was calculated for the associated p value using 10,000 re-samples.
For the third outcome, the reported age at onset of PD using multiple regression models was predicted (Draper and Smith, Applied Regression Analysis (Wiley, New York, ed. 2, 1981)). Goodness-of-fit was described through the model R2 values and plots of the predicted vs. observed ages at onset. An F test was used to assess the significance of the overall model, and a 95% bias-corrected bootstrap CI was calculated for the associated p value and the R2 using 10,000 re-samples. Assumptions were tested throughout. Only the linear regression models required a data transformation; age at onset-squared was used as the outcome to meet the required normality assumption. Tests of linkage disequilibrium (LD) in unaffected siblings was performed for the SNPs in the final models for the three outcomes using LDSELECT v. 1.0 (2004 by Deborah A. Nickerson, Mark Rieder, Chris Carlson, Qian Yi, University of Washington) with a threshold R2 of 0.80.
The following scheme was used to develop models for each outcome (
While most SNPs had fairly complete data, others had missing values from substantial numbers of subjects: up to 18% of cases, and up to 32% of case-sib pairs. An approach was chosen where candidate models were constructed using sets of SNPs with fairly complete data (effective sample sizes close to the maximum 443) to explain as much of the outcomes as possible and then checked to see if adding other SNPs on top of the candidate models would contribute significantly. The candidate models for each set were constructed using standard automated procedures (Step 5). A final candidate model was selected for each outcome based on significance and goodness-of-fit (Step 6). Other SNPs, which were significant given the candidate models (Step 7) and had significant pair-wise interactions (Step 8), were then added.
To compare the significance of axon guidance pathway SNP models to the significance of randomly selected genomic SNP models, 4,000 models were constructed for each outcome by randomly selecting the appropriate number of SNPs from the entire available data set. The distributions of the test statistics from those models and the values of the test statistics from final models were then plotted.
Validation with a Second Whole-Genome Association Dataset
A second whole-genome association study of PD is described elsewhere (Fung et al., Lancet Neurol., 5: 911-916 (2006)). That study included 276 patients with PD and 276 neurologically normal and unrelated controls. The samples used for that study were derived from the NINDS Neurogenetics repository hosted by the Coriell Institute for Medical Research. There were 408,803 SNPs individually genotyped, and call rates and Hardy-Weinberg equilibrium p-values were described elsewhere (Elbaz et al., Lancet Neurol., 5:917-923 (2006)). The individual level data for that study were downloaded from the Coriell Institute website, and SNPs were identified in that secondary dataset that were assigned to the genes represented by SNPs in each of the predictive genetic models for the primary dataset (same genes and outcomes, different SNPs and samples). The same statistical methods were then employed to construct predictive genetic models for the same outcomes in the secondary dataset, with two exceptions. First, because the secondary dataset employed unrelated controls that were not individually matched, unconditional logistic regression analyses were performed instead of conditional logistic regression analyses for the PD susceptibility outcome. Second, the age at onset of PD analyses in the secondary dataset was restricted to exclude subjects with ages younger than those reported in the primary dataset, which resulted in the exclusion of one subject whose age at onset was reportedly 13 years.
Validation with a Gene Expression Profiling Dataset
An available gene expression profiling dataset was explored to determine if there was convergence of those functional data with the genetic association data and models (Papapetropoulos et al., Gene Expression, 13:205-215 (2006)). That study included multiregional gene expression data from postmortem brain specimens from 22 PD cases and 23 normal aged brain donors and represents the most comprehensive expression profiling study of PD to date (largest numbers of subjects, brain regions studied, and genes assayed). Very strict RNA quality control criteria were used. Data derived from Affymetrix Human Genome U133 Plus 2.0 GeneChip arrays, which included probe set data for 126 of the 128 brain-expressed axon guidance pathway genes that were initially identified were analyzed. Probe set data for the substantia nigra, putamen, and caudate regions were analyzed using methods and criteria similar to those described elsewhere (Papapetropoulos et al., Gene Expression, 13:205-215 (2006)).
This and the following three paragraphs contain information regarding the study design, expression values calculations, and array normalization methods for the published study. Briefly, postmortem brain tissue was obtained from University of Miami/NPF Brain Endowment Bank donors diagnosed with neuropathologically confirmed PD (Hughes et al., J. Neurol. Neurosurg. Psychiatry, 55(3):181-184 (1992)) (n=22) or aged individuals with no history or pathological diagnosis of neurologic or psychiatric disease (n=23). Detailed demographics of the cohort is described elsewhere (Papapetropoulos et al., Gene Expression, 13(3):205-215 (2006)). All patients were carefully characterized for clinical phenotype, treatment, and agonal state. Regional samples of postmortem brain were taken from frozen coronal blocks based on surface and cytoarchitectural landmarks after controlling for important determinants of RNA quality (postmortem interval, brain pH).
Whole tissue from the substantia nigra, putamen, and caudate was used to conduct gene microarray experiments. Total RNA isolation was performed using a TriZol method and RNeasy columns, and labeled cRNA was prepared according to the manufacturer's protocol (Affymetrix, Santa Clara, Calif.). Total RNA from each sample was used to prepare biotinylated target RNA, following the manufacturer's recommendations (See, e.g., World Wide Web at “affymetrix.com/support/technical/manual/expression_manual.affx”). The target cDNA generated from each sample was processed as per manufacturer's recommendation using an Affymetrix GeneChip Instrument System (See, e.g., World Wide Web at “affymetrix.com/support/technical/manual/expression_manual.affx”). The samples were checked for evidence of degradation and integrity, A260/A280>1.9 and 28S/18S>1.6, (2100 Bioanalyzer; Agilent Technologies, Palo Alto, Calif.). The Human Genome U133 Plus 2.0 GeneChips were used (See, e.g., World Wide Web at “affymetrix.com/products/arrays/specific/hgu133plus.affx”). Microarray quality control parameters included the following: background, noise (RawQ), consistent number of genes detected as present across arrays, consistent scale factors, and consistent β-actin and glyceraldehyde-3-phosphate dehydrogenase 5′/3′ signal ratios.
The gene expression from three different brain regions in PD patients and their age matched normal subjects were analyzed. After checking for microarray quality control, the following number of samples were used in the analysis of genes in the constructed models of axonal guidance: for substantia nigra 16 Parkinson's disease patients vs. 8 controls, for putamen 13 Parkinson's disease patients vs. 7 controls, and for caudate nucleus 14 Parkinson's disease patients vs. 9 controls (22 cases and 23 controls total).
Genes were selected on the basis of “present calls” by Microarray Analysis Suite 5.0. For a gene to be included, it had to be present (detectable) in at least 75% of the subjects in at least 1 of the 2 groups (Parkinson's disease patients and controls) to reduce the chances of false-positive findings. Expression data were analyzed using Genesis (GeneLogic, Gaithersburg, Md.) and AVADIS software (Strand Genomics, Redwood City, Calif.). Following normalization, one-way analysis of variance was performed for each gene to identify statistically significant gene expression changes. Two criteria were used to determine whether a gene was differentially expressed: p value ≦0.05 and fold-change (FC) of ±1.3.
For the differential expression analyses of the present study, probe sets were identified from within the original dataset that were assigned to the axon guidance pathway genes of interest (those represented by SNPs in either of the three predictive genetic models in the primary whole-genome association dataset). For each probe set, expression for cases and controls were compared, in each of the three nigrostriatal regions. A probe set was considered informative in a given region if it was expressed in at least 75% of cases or 75% of controls. Although differential expression of a single probe set is most of the times enough to characterize a gene as differentially expressed (multiple polyadenylation sites are represented on Affymetrix gene chips to account for multiple gene transcripts), for the present study a more conservative definition of differential gene expression than for the original study was employed. A gene was defined as differentially expressed in a given region if at least one accurate-type (at) probe set assigned to the gene had a t-test p value <0.05 and absolute value of the fold expression ≧1.3. Alternatively, a gene was defined as differentially expressed in a given region if at least 30% of accurate-type (at) or cross reacting-type (x_at, s_at) probe sets assigned to the gene had t-test p values <0.05 and absolute values of the fold expression ≧1.3. These definitions weigh the significance of cross-reactive type probe sets lower than accurate-type probe sets because they have less specificity, and also account for the possibility that multiple probe sets for the same gene may not provide concordant gene expression measurements (Elbez et al., BMC Genomics, 7:136 (2006)).
A sensitivity analysis was also performed where a gene was defined as differentially expressed in a given region if at least one probe set of any type assigned to the gene had a t-test p value <0.05 and absolute value of the fold expression ≧1.3. This was consistent with the differential expression analyses performed for the original study. Finally, a second sensitivity analysis was performed whereby the same CEL file data as for the original gene expression profiling study was employed. The data were re-normalized and re-analyzed using a second standard software package (GeneSpring, Agilent Technologies). That normalization procedure obtained informative results for all probe sets and in all 45 genes. Briefly, gene chip raw Affymetrix data in the form of CEL files were uploaded in GeneSpring v7.2 (Agilent Technologies), normalized with GC-RMA, and re-analyzed using the same statistical tests and criteria.
For the primary differential expression analyses, possible bias due to the number of informative probe sets per gene was tested by comparing the distributions for differentially expressed versus normally expressed genes using the Wilcoxon rank sum test. For the primary and two sensitivity analyses, the differential expression of genes was coded as increased or reduced if all probe sets assigned to the gene had the same direction of effect, or ambiguous if the probe sets had opposite directions of effect.
The available whole-genome association study dataset included 443 PD cases and 443 unaffected sibling controls (Tier 1 sample). Details regarding the demographic and clinical characteristics of these subjects is described elsewhere (Maraganore et al., Am. J. Hum. Genet. 77:685 (2005)). The median age at onset of PD among the cases was 61 years (range 31-94). Details regarding the SNP markers genotyped, including call rates, Hardy-Weinberg equilibrium estimations, and re-genotyping concordance rates are described elsewhere (Maraganore et al, Am. J. Hum. Genet. 77:685 (2005)). A detailed listing of SNPs within axon guidance pathway genes expressed in the brain is provided in Table 4. Table 4 contains sequences that can be used to assess a human's nucleic acid for the presence of a polymorphism set forth in Table 1, 2, 3, or 4.
Of the 1,376 SNPs within brain-expressed genes of the KEGG axon-guidance pathway, 183 SNPs (13.3%) were individually associated with susceptibility to PD. Table 1 contains results for the final model produced by running SNPs through the multi-stage process to predict PD susceptibility. This model used data from 442 matched PD patients/sibling controls (1 pair was missing data on one or more SNPs). To determine the significance of the pathway, rather than individual SNPs, the P value for the overall model was assessed. In this case, the model had an overall P value of 4.64×10−38.
†a = log additive, d = Mendelian dominant, r = Mendelian recessive.
This model significantly predicted whether or not an individual was a case or unaffected sibling. The predicted probabilities of PD were high (towards 1) for most of the cases, and low (towards 0) for most of the unaffected siblings (
Of the 1,376 SNPs, 175 (12.7%) were individually associated with age at onset (hazard function) using Cox proportional hazards models. Table 2 contains results for the final proportional hazards model produced by running SNPs through the multi-stage process to predict PD age at onset. This model used data from 400 PD patients (43 patients were missing data on one or more SNPs). In this case, the model had an overall P value of 5.43×10−48. The model was not significant at predicting age at study of the matched sibling controls (P=0.73). This last finding suggests that the model predicts age at onset (hazard function) of PD, not age in general, and that the model is specific for PD cases.
†a = log additive, d = Mendelian dominant, r = Mendelian recessive.
Of the 1,376 SNPs, 160 (11.6%) were individually associated with age at onset using linear regression models. Table 3 contains results for the final model produced by running SNPs through the multi-stage process to predict PD age at onset-squared. This model used data from 395 PD patients (48 patients were missing data on one or more SNPs). In this case, the model had an overall P value of 1.68×10−51. The set of SNPs was not significant at predicting age at study-squared of the matched sibling controls (P=0.34). This last finding suggests that the model predicts age at onset of PD, not age in general, and that the model is specific for PD cases.
†a = log additive, d = Mendelian dominant, r = Mendelian recessive.
Other combinations of SNPs from the axon guidance pathway also performed well in predicting PD susceptibility and age at onset. Although the models provided herein revealed the best fit of the data, the results do not preclude other combinations of axon guidance pathway SNPs as significant predictors of PD. The SNPs in the final models selected exhibited no significant LD in unaffected siblings.
Validation with a Second Whole-Genome Association Dataset
A second available whole-genome association dataset for PD was mined to determine whether the genes in each of the predictive genetic models in the primary whole-genome association dataset were also predictive of the same PD outcomes in the secondary dataset (same genes and outcomes, different SNPs and samples). The secondary whole-genome association dataset included 1,195 SNPs in the 22 genes from the model predicting PD susceptibility in the primary dataset. Of those SNPs, 127 (10.6%) were individually associated with susceptibility to PD.
Table 5 contains results for the final model produced by running SNPs through the multi-stage process to predict PD susceptibility. This model used data from 528 subjects (264 PD patients and 264 unrelated controls; 8 subjects were missing data on one or more SNPs). The model had an overall p value of 3.93×10−44. The odds ratios (95% CIs) for the groups defined by predicted PD probability of <0.25, 0.25-0.50, 0.50-0.75, and >0.75 were as follows: 1 (reference), 7.86 (3.94-15.71), 16.14 (8.13-32.05), and 121.14 (56.63-259.14), respectively. The predicted probabilities of PD were high (towards 1) for most of the cases, and low (towards 0) for most of the unaffected siblings (
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
The secondary whole-genome association dataset included 1,411 SNPs in the 26 genes from the model predicting age at onset of PD in the primary dataset. Of those SNPs, 142 (10.1%) were individually associated with survival free of PD (hazard function) using Cox proportional hazards models. Table 6 contains results for the final proportional hazards model produced by running SNPs through the multi-stage process to predict survival free of PD. This model used data from 263 PD patients (5 patients were missing data on one or more SNPs). In this case, the model had an overall p value of 6.30×10−35. However, the model was not significant at predicting survival (age at study) of the matched sibling controls (p=0.14).
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
The secondary whole-genome association dataset included 1,605 SNPs in 28 of the 29 genes from the final model predicting age at onset of PD in our primary dataset. Of those SNPs, 157 (9.8%) were individually associated with age at onset of PD using linear regression models. Table 7 contains results for the final model produced by running SNPs through the multi-stage process to predict PD age at onset-squared. This model used data from 265 PD patients (3 patients were missing data on one or more SNPs). In this case, the model had an overall p value of 4.72×10−40. However, the set of SNPs was not significant at predicting age at study-squared of the matched sibling controls (p=0.527).
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Validation with a Gene Expression Profiling Dataset
An available gene expression profiling dataset for PD that considered 21 different brain regions was mined. Details regarding the subjects, biological samples, and microarray experiments are described elsewhere (Papapetropoulos et al., Gene Expression, 13:205-215 (2006)). For this study, data analyses was limited to the substantia nigra and the striatum (putamen and caudate nuclei), since these are the brain regions contributing most significantly to the nigrostriatal dopamine deficiency that is characteristic of PD and since the three PD outcomes were defined according to the corresponding motor phenotype. There was a total of 45 genes represented by the SNPs listed for the three predictive genetic models (Tables 1-3), and the gene expression dataset had informative probe sets for 32 of those genes in the substantia nigra, 34 in the putamen, and 35 in the caudate. For the 45 axon guidance pathway genes represented by SNPs in the three predictive genetic models (as defined by the primary whole-genome association dataset), Table 8 provides detailed gene expression data for each of the three nigrostriatal regions considered, including probe set type, t-test p value, fold expression difference in cases and controls, and percent of cases and controls with expression present for the probe set. In each region, there were more differentially expressed genes observed than expected by chance: substantia nigra, 7 observed (22%) vs. 1.6 expected (5%); putamen, 5 observed (15%) vs. 1.7 expected (5%); and caudate, 5 observed (14%) vs. 1.8 expected (5%). Overall, 36 genes had data in at least one of the three regions, and 14 (39%) of those were differentially expressed in at least one region.
Very similar results were found from the sensitivity analyses treating accurate-type and cross-reacting type fragments equally. Again, in each region, there were more differentially expressed genes than expected by chance: substantia nigra, 22 observed (24%) vs. 4.7 expected (5%); putamen 11 observed (12%) vs. 4.7 expected (5%); and caudate, 16 observed (17%) vs. 4.7 expected (5%). Restricting to the 45 genes represented in the three predictive genetic models, the results were: substantia nigra, 7 observed (22%) vs. 1.6 expected (5%); putamen, 5 observed (15%) vs. 1.7 expected (5%); and caudate, 7 observed (20%) vs. 1.8 expected (5%). Overall, of the 36 genes with data in at least one of the three regions, 14 (39%) were still differentially expressed in at least one region.
The raw data from the original gene expression profiling study were re-normalized and re-analyzed using a different standard software package, in order to obtain results for all probe sets in all 45 genes and in all three regions. Differential expression of axon guidance pathway genes in PD was again observed, although the findings were more modest. Thirteen genes were differentially expressed in at least one region (8 in the substantia nigra, 1 in the putamen, 4 in the caudate).
For all analyses of differential gene expression, it was possible to code the differential expression unambiguously as increased or reduced. For genes with multiple informative and differentially expressed probe sets, the direction of effect was always the same.
The results provided herein provide compelling evidence that genetic variability in the axon guidance pathway predisposes to PD. The scope and magnitude of the effect was sizeable. For example, polymorphism in the axon guidance pathway accounted for nearly 70% of the age at onset variance of PD, and persons with high versus low model scores differed in age at onset by more than 20 years. The observed P values are likely amongst the smallest ever observed for a genetic analysis of PD or any complex disease, and withstand traditional adjustments for multiple comparisons.
The SNPs that were independent predictors of susceptibility or age at onset of PD may be only indirect markers of the functional variants within the genes. Fine mapping of these gene loci in diverse populations and experimental studies in transfected cell lines or in transgenic animals may help to elucidate the pathogenic mechanism of the axon guidance pathway variability that was observed. It is postulated that polymorphism in the axon guidance pathway genes results in aberrant trajectory of the ascending dopaminergic pathway during embryonic brain development and thus renders persons congenitally deficient in nigrostriatal dopamine. In other words, some people are wired differently from birth and are thus at greater risk to develop PD and at an earlier age.
In summary, multiple SNPs in axon guidance pathway genes were strong predictors of susceptibility and age at onset of PD. These findings might also suggest a new focus on environmental exposures that occur during intrauterine life. This study demonstrates that the combination of bioinformatic characterization of pathways with analysis of available whole-genome datasets can make important contributions to the understanding of complex diseases. The results provide evidence that complex diseases can be due to the joint effects of many genes which, taken singly, would show only small effects (“additive effects model”, “epistasis”) (Fisher, Trans. R. Soc. Edin. 52:399 (1918); Burton et al., Lancet 366:941 (2005); W. Batteson, Mendel's Principles of Heredity. (Cambridge Univ. Press, Cambridge, 1993); and Cordell and Clayton, Lancet 366, 1121 (2005)). They also provide an example of how a complex disease that appears to be largely sporadic and non-genetic can in fact have a strong genetic component (McDonnell et al., Ann. Neurol. 59, 788 (2006); Tanner et al., JAMA 281:341 (1999); and Rocca et al., Ann. Neurol. 56:495 (2004)).
An attempted was made to cripple the models by removing SNPs in the reverse order from which they were selected, which should generally remove the single most important SNP first. The results for the first 10 SNPs removed from each model were as set forth in Table 9, 10, and 11.
While the biggest reductions in the overall model p-values came from removing the first four SNPs in each table, even removing ten SNPs (plus their interactions) from each model left highly significant p-values. These results demonstrate that the models are remarkably robust.
This study employed a genomic pathway approach to determine whether polymorphism in the axon guidance pathway predisposed to ALS. Specifically, bioinformatic methods were used to mine an available whole-genome association dataset for SNPs that were within brain-expressed, axon guidance pathway genes. Then, statistical methods were used to construct models of axon guidance pathway SNPs that predicted three outcomes: ALS susceptibility, survival free of ALS, and age at onset of ALS. The primary whole-genome association study dataset employed by this study included 275 ALS cases and 269 unrelated controls. The median age at onset of ALS among the cases was 54 years (range 26-87). Details regarding the SNP markers genotyped, including call rates, Hardy-Weinberg equilibrium estimations, and re-genotyping concordance rates are described are described elsewhere (Schymick et al., Lancet Neurol., 6(4):322-8 (2007)). The bioinformatic methods identified 128 brain-expressed axon guidance pathway genes, and the SNP dataset included 4,133 SNPs within 124 of those genes.
All statistical tests were two-tailed, and considered significant at the conventional alpha level of 0.05. All statistical analyses were performed in SAS v. 9.1 (SAS Institute Inc., Cary, N.C.) or S-Plus v. 7 (Insightful Corp., Seattle, Wash.). Three outcomes of interest were considered: 1) ALS susceptibility, 2) survival free of ALS, and 3) age at onset of ALS. One goal was to identify joint action models of SNPs from the axon guidance pathway that predicted each of the three outcomes.
For the first outcome, unconditional logistic regressions were used to examine associations of the SNPs with ALS susceptibility while adjusting for age and gender. For each SNP, odds ratios (ORs), 95% confidence intervals (CIs), and p values were calculated. Goodness-of-fit was assessed through measuring concordance and visually through histograms of predicted probabilities. Overall odds ratios were estimated by categorizing the predicted probability of ALS from the model into four groups (<0.25, 0.25-0.50, 0.50-0.75, and >0.75), and then calculating the odds ratios for each group relative to the <0.25 group. A likelihood ratio test was used to assess the significance of the overall model. A 95% bias-corrected bootstrap CI was calculated for the associated p value using 10,000 re-samples.
For the second outcome, Cox proportional hazards models were used to test for associations of the SNPs with survival free of ALS. For each SNP, hazard ratios (HRs), 95% confidence intervals, and p values were calculated. Concordance was again calculated for the proportional hazards models, and Kaplan-Meier plots of categorized scores predicting risk of ALS were generated to provide visual gauges for goodness-of-fit. Hazard ratios were calculated for risk groups categorized at the quartiles, using the lowest risk group as reference. A likelihood ratio test was used to assess the significance of the overall model. A 95% bias-corrected bootstrap CI was calculated for the associated p value using 10,000 re-samples.
For the third outcome, the reported age at onset of ALS was predicted using multiple regression models. Goodness-of-fit was described through the model R2 values and plots of the predicted vs. observed ages at onset. An F test was used to assess the significance of the overall model. A 95% bias-corrected bootstrap CI was calculated for the associated p value and the R2 using 10,000 re-samples. Assumptions were tested throughout. Tests of linkage disequilibrium in unaffected siblings for the SNPs in the final models for the three outcomes were performed using LDSELECT v. 1.0 (copyright 2004 by Deborah A. Nickerson, Mark Rieder, Chris Carlson, Qian Yi, University of Washington) with a threshold R2 of 0.80.
While most SNPs had fairly complete data, others had missing values from substantial (up to 28%) numbers of subjects. We therefore chose an approach where we constructed candidate models using sets of SNPs with fairly complete data (effective sample sizes close to the maximum) to explain as much of the outcomes as possible, then checked to see if adding other SNPs on top of the candidate models would contribute significantly. We constructed the candidate models for each set using standard automated procedures (Step 5), and selected a final candidate model for each outcome based on significance and goodness-of-fit (Step 6). We then added other SNPs, which were significant given the candidate models (Step 7), and significant pair-wise interactions (Step 8).
To compare the significance of our axon guidance pathway SNP models to the significance of randomly selected genomic SNP models, we constructed 5,000 models for each outcome by randomly selecting the appropriate number of SNPs from the entire available data set. We then plotted the distributions of the test statistics from those models and the values of the test statistics from the final models.
Of the 4,133 SNPs within brain-expressed genes of the axon-guidance pathway, 442 SNPs (10.7%) were individually associated with susceptibility to ALS, as detailed in Table 12. Table 13 contains results for the final model produced by running SNPs through the multi-stage process to predict ALS susceptibility. This model used data from 542 unmatched ALS patients and unrelated controls (2 subjects were missing data on one or more SNPs). The odds ratios (95% CIs) for the groups defined by predicted ALS probability of <0.25, 0.25-0.50, 0.50-0.75, and >0.75 were as follows: 1 (reference), 17.60 (5.70-54.36), 112.00 (35.45-353.83), and 1739.73 (523.53-5781.32) respectively. Since the interest was in the significance of the pathway, rather than individual SNPs, the p value for the overall model was of primary importance. In this case, the model had an overall p value of 2.92×10−60 (95% CI 8.34×10−52-1.16×10−68). This model significantly predicted whether or not an individual was a case or an unrelated control. The predicted probabilities of ALS were very high (towards 1) for most of the cases, and very low (towards 0) for most of the unrelated controls (
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Of the 4,133 SNPs, 451 (10.9%) were individually associated with survival free of ALS (hazard function) using Cox proportional hazards models, as detailed in Table 12. Table 14 contains results for the final proportional hazards model produced by running SNPs through the multi-stage process to predict survival free of ALS. This model used data from 274 ALS patients (1 patient was missing data on one or more SNPs). In this case, the model had an overall p value of 1.25×10−74 (95% CI 2.22×10−60-1.24×10−89). By contrast, the model was not significant at predicting survival (age at study) of the matched sibling controls (p=0.15). This last finding suggests that the model predicts survival free of ALS (hazard function), but not survival in general, and that the model is specific for ALS cases.
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Of the 4,133 SNPs, 487 (11.8%) were individually associated with age at onset of ALS using linear regression models, as detailed in Table 12. Table 15 contains results for the final model produced by running SNPs through the multi-stage process to predict ALS age at onset. This model used data from 272 ALS patients (3 patients were missing data on one or more SNPs). In this case, the model had an overall p value of 9.14×10−76 (95% CI 8.99×10−58-3.19×10−92). By contrast, the set of SNPs was not significant at predicting age at study of the matched sibling controls (p=0.55). This last finding suggests that the model predicts age at onset of ALS, not age at the time of the study, and that the model is specific for ALS cases.
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Other combinations of SNPs from the axon guidance pathway also performed quite well in predicting ALS susceptibility, survival free of ALS, and age at onset of ALS. Although the models provided herein provided good fits to the data, the results do not preclude other combinations of axon guidance pathway SNPs as significant predictors of ALS. The SNPs in the final models selected showed no significant linkage disequilibrium in unrelated controls.
Examples 1 and 4 are similar but for Example 1, the analysis of the secondary whole genome association dataset is restricted to the 45 genes that were represented by SNPs in the final predictive models of PD in the primary dataset. In contrast, Example 4 considers the secondary whole genome association dataset only, without restriction to the 45 genes of Example 1. In other words, Example 4 allows consideration of all 128 brain expressed axon guidance pathway genes within the secondary whole genome association dataset.
This study employed a genomic pathway approach to determine whether polymorphism in the axon guidance pathway predisposed to PD. For example, we employed bioinformatic methods to mine an available whole-genome association dataset for SNPs that were within brain-expressed, axon guidance pathway genes. We then employed statistical methods to construct models of axon guidance pathway SNPs that predicted three outcomes: PD susceptibility, survival free of PD, and age at onset of PD. The primary whole-genome association study dataset employed by this study included 269 PD cases and 267 unrelated controls. The median age at onset of PD among the cases was 64 years (range 13-84). The bioinformatic methods identified 128 brain-expressed axon guidance pathway genes and the SNP dataset included 3,095 SNPs within 122 of those genes.
Of the 3,095 SNPs within brain-expressed genes of the axon-guidance pathway, 295 SNPs (9.5%) were individually associated with susceptibility to PD, as detailed in Table 16. Table 17 contains results for the final model produced by running SNPs through the multi-stage process to predict PD susceptibility. This model used data from 516 unmatched PD patients and unrelated controls (20 subjects were missing data on one or more SNPs). The odds ratios (95% CIs) for the groups defined by predicted PD probability of <0.25, 0.25-0.50, 0.50-0.75, and >0.75 were as follows: 1 (reference), 1.85 (0.58-5.92), 19.03 (8.56-42.30), and 391.82 (157.94-972.06), respectively. Since we were interested in the significance of the pathway, rather than individual SNPs, the p value for the overall model was of primary importance. In this case, the model had an overall p value of 8.10×10−71 (95% CI 2.34×10−64-1.67×10−76). This model significantly predicted whether or not an individual was a case or an unrelated control. The predicted probabilities of PD were very high (towards 1) for most of the cases, and very low (towards 0) for most of the unrelated controls (
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Of the 3,095 SNPs, 327 (10.6%) were individually associated with survival free of PD (hazard function) using Cox proportional hazards models, as detailed in Table 16. Table 18 contains results for the final proportional hazards model produced by running SNPs through the multi-stage process to predict survival free of PD. This model used data from 264 PD patients (4 patients were missing data on one or more SNPs). In this case, the model had an overall p value of 9.02×10−58 (95% CI 1.48×10−46-7.90×10−70). By contrast, the model was not significant at predicting survival (age at study) of the matched sibling controls (p=0.80). This last finding suggests that the model predicts survival free of PD (hazard function), but not survival in general, and that the model is specific for PD cases.
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Of the 3,095 SNPs, 326 (10.5%) were individually associated with age at onset of PD using linear regression models, as detailed in Table 16. Table 19 contains results for the final model produced by running SNPs through the multi-stage process to predict PD age at onset. This model used data from 261 PD patients (7 patients were missing data on one or more SNPs). In this case, the model had an overall p value of 2.12×10−71 (95% CI 1.06×10−528-6.47×10−81). By contrast, the set of SNPs was not significant at predicting age at study of the matched sibling controls (p=0.98). This last finding suggests that the model predicts age at onset of PD, not age at the time of the study, and that the model is specific for PD cases.
†a = log-additive, d = Mendelian dominant, r = Mendelian recessive.
Other combinations of SNPs from the axon guidance pathway also performed quite well in predicting PD susceptibility, survival free of PD, and age at onset of PD. Although the models provided herein provided good fits to the data, the results do not preclude other combinations of axon guidance pathway SNPs as significant predictors of PD. The SNPs in the final models selected showed no significant linkage disequilibrium in unrelated controls.
The final model predicting ALS susceptibility contained 31 genes, and the final model predicting PD susceptibility contained 39 genes. In total, the models contained 46 genes, of which 24 (52.2%) were shared, and 22 (47.8%) were not. Of the 22, 7 were in the ALS model but not the PD model, and 15 were in the PD model but not in the ALS model.
The final model predicting survival free of ALS contained 34 genes, and the final model predicting survival free of PD contained 28 genes. In total, the models contained 45 genes, of which 17 (37.8%) were shared, and 28 (62.2%) were not. Of the 28, 17 were in the ALS model but not the PD model, and 11 were in the PD model but not in the ALS model.
The final model predicting age at onset of ALS contained 30 genes, and the final model predicting age at onset of PD contained 28 genes. In total, the models contained 43 genes, of which 15 (34.9%) were shared, and 28 (65.1%) were not. Of the 28, 15 were in the ALS model but not the PD model, and 13 were in the PD model but not in the ALS model.
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Application Ser. No. 60/842,054, filed Aug. 31, 2006; which is incorporated by reference in its entirety.
Funding for the work described herein was provided by the federal government under grant numbers ES10751 and NS33978 awarded by the National Institute of Health. The federal government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
60842054 | Aug 2006 | US |