This application contains a sequence listing submitted electronically via EFS-web, which serves as both the paper copy and the computer readable form (CRF) and consists of a file entitled “001881-8006US02_seqlist.txt”, which was created on Sep. 22, 2017, which is 274,432 bytes in size, and which is herein incorporated by reference in its entirety.
The field of the technology provided herein relates generally to pulmonary and related diseases and the diagnosis and prognosis thereof.
Chronic obstructive pulmonary disease (COPD) is a complex disease characterized clinically by airflow obstruction, with cigarette smoking considered its primary environmental risk factor.
COPD is currently the fourth leading cause of chronic morbidity and mortality in the United States (National Institutes of Health and National Heart Lung and Blood Institute 2007, Am. J. Repir. Crit. Care Med. 176:532-555; Mannino and Braman 2007, Proc. Am. Thorac. Soc. 4:502-SEQ506). It is a preventable and treatable disease characterized by airflow limitation that is not fully reversible (National Institutes of Health and National Heart Lung and Blood Institute 2007). The airflow limitation results from small airway disease (obstructive bronchiolitis) and parenchymal destruction (emphysema) caused by chronic inflammation and structural changes due to repeated injury and repair (National Institutes of Health and National Heart Lung and Blood Institute 2007).
Cigarette smoking is the most important environmental risk factor for COPD (Marsh et al. 2006, Eur. Respir. J. 28:883-886; National Institutes of Health and National Heart Lung and Blood Institute 2007; Mannino and Braman 2007). It is estimated that 25% to 50% of smokers may develop COPD as defined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) spirometric criteria, (Lundbäck et al. 2003, Respir. Med. 97:115-122; Lokke et al. 2006, Thorax 61:935-939; Mannino and Braman 2007)
Lung function declines gradually across adult life, even in healthy non-smokers, and this decline accelerates with age (Camilli et al. 1987, Am. Rev. Respir. Dis. 135:794-799; Lange et al. 1989, Eur. Respir. J. 2:811-816; Lundbäck et al. 2003; Wise 2006, Am. J. Med. 119 ((10A)):S4-S11). Factors associated with lung function decline in middle-aged and older adults have been identified, primarily in cross-sectional studies (Enright et al. 1994, Chest 106:827-834; Kerstjens et al. 1996, Am. J. Repir. Crit. Care Med. 154:S266-S272). However, predictions based on cross-sectional correlates may not adequately predict longitudinal change within individuals (Knudson et al. 1983, Am. Rev. Respir. Dis. 127:725-734; Griffith et al. 2001, Am. J. Respir. Crit. Care Med. 163:61-68), and the effect of cigarette smoking on trajectories of lung function decline throughout adult life have not been widely modeled using longitudinal statistical methods.
COPD is a heterogeneous disease of complex etiology, including genetic and environmental components. Lung function is determined by the interplay of multiple underlying factors and processes. Consequently, impaired lung function in any individual may have different causes (e.g., prenatal effects, poor baseline lung function, age, and exposure to occupational toxins and cigarette smoke). Given that these risk factors are likely to act through distinct biological mechanisms, methods for discovering biomarkers associated with impaired lung function must account for this likely etiological heterogeneity. Conventional outcome measures of lung function, such as clinically based COPD case-control status and spirometric measurements, are limited in this respect. Exposure is generally not considered quantitatively, and cross-sectional measures cannot assess the trajectory of lung function decline. Conversely, longitudinal data offer the possibility of deconvoluting the etiological factors affecting lung function. The advantage lies in the structure of the data-repeated measurements of lung function and various risk factors (e.g., age, smoking exposure) collected for the same individuals over time. That data structure allows quantification of differences in susceptibility to the various causes of lung function decline across individuals.
In view of the foregoing, longitudinal data, containing repeated measurements of lung function and various risk factors, were analyzed to quantify differences underlying the susceptibility to the various causes of lung function decline. The data included four outcome measures of lung function or decline in lung function, measured spirometrically as the forced expiratory volume in 1 second (FEV1) (Knudson et al., 1983) and were derived by fitting mixed models to longitudinal spirometric, smoking history, and demographic data obtained over the subjects' 17-year average participation period in the Lung Health Study (LHS) and General addiction Project (GAP). Conceptually, these measures represent different underlying biological processes driving lung function decline. The optimal model of the data was selected based on likelihood ratio tests, which were used to determine the significance of each fixed and random effect parameter as it was added to the model (Willet et al., 1998, Developmental Psychopathology 1998; 10:395-426). After the optimal model was identified, the outcome variables were calculated as best linear unbiased predictors (BLUPs) of the random effects, focusing on age-related decline (Age decline), pack-years-related decline (Pack-years decline), and the intensifying effects of smoking, in terms of number of cigarettes per day (CPD), on decline with age (CPD×Age decline). These BLUPs together accounted for the vast majority of individual differences in lung function decline in these subjects. In addition, Baseline Lung function (BL) was measured at subjects' entry into the study as an outcome measure as it has also been shown to vary in magnitude across individuals (Griffith et al., 2001).
There is some evidence that immune system dysregulation may be involved in the pathophysiology of COPD and that genetic differences in regulation of cigarette smoking-related inflammatory changes may influence individual disease risk.
Work described herein relates to the discovery of associations between pulmonary disease such as COPD and variations in the nucleotide sequence of nineteen chromosomal regions. Embodiments described herein provide chromosomal regions and SNPs found therein having significant novel COPD associations. As described below, some of the SNPs are in or near genes that function in biological processes such as cilia function/lung clearance, neutrophil activation, and complement regulation. The genes, intragenic regions, and identified variations in the nucleotide sequence in those regions (e.g., SNPs) associated with COPD found in each of the nineteen chromosomal regions provided herein are listed in Tables 5a, 5b, 7, 8 and/or in
Based on the identification of those chromosomal regions including specific SNPs associated with pulmonary disease, such as COPD, methods are provided for detecting a predisposition to, or diagnosing the presence of, lung disease, such as COPD. Such methods comprise identifying one or more variations in a nucleotide sequence of one or more of those chromosomal regions. Variations in the nucleotide sequence of those regions, identified herein as chromosomal regions 1-19, can be correlated with a predisposition to, or the presence of, COPD in a subject.
Methods are provided for detecting a predisposition to, or diagnosing the presence of, lung disease in a subject described herein, including the use of a variety of genetic and molecular techniques to identify variations in the nucleotide sequence of chromosomal regions 1-19 in the subject. Evaluation of the nucleotide sequence to identify variation in those chromosomal regions may be conducted at the level of chromosomal DNA, or portions thereof (e.g., PER amplified gene segments). Alternatively, evaluation of the nucleotide sequence to identify variation in those regions may be conducted at the level of molecules expressed or encoded by those chromosomal regions (e.g., mRNAs or protein coding regions thereof or polypeptide/proteins encoded by those chromosomal regions).
In one embodiment, a method of detecting a predisposition to, a diagnosis of, a prognosis of, the severity of, or the response to treatment for a pulmonary disease (e.g., COPD) in a subject comprises identifying variations in the nucleotide sequence of one or more chromosomal regions selected from regions 1-19 of said subject, where the presence of one or more variations in said chromosomal regions indicates a predisposition to, or the presence of, COPD in the subject; wherein said variations in nucleotide sequence have a q-value of less than 0.5 for their association with decline in lung function.
Kits described herein can be used, for example, in performing one or more of the methods described herein. One embodiment provides for a kit comprising one or more nucleic acid probes for the identification of one or more variations in a nucleotide sequence of one or more chromosomal regions selected independently from regions 1-19. Such kits may further comprise one or more control nucleic acid molecules for said variations in said nucleotide sequence. In some embodiments, the kit comprises a means for identifying an amino acid sequence or a variation in an amino acid sequence encoded by a gene in a chromosomal region selected from regions 1-19. In one embodiment, the kit comprises an antibody that is capable of identifying an amino acid sequence encoded by a gene in a chromosomal region selected from regions 1-19. Such kits optionally comprise instructions describing the use of the kit.
In one embodiment, the present disclosure provides for compositions comprising two or more nucleic acid molecules that each comprise a nucleotide sequence complementary to different portions of chromosomal regions 1-19. In one aspect of such an embodiment, the two or more nucleic acid molecules comprise two, three, four, five, six, seven, eight, nine, ten, fifteen, nineteen or more nucleic acid molecules and said different portions of chromosomal regions 1-19 comprise portions of two, three, four, five, six, seven, eight, nine, ten, fifteen, nineteen or more different independently selected chromosomal regions.
Also provided for herein are compositions comprising two or more, three or more, four or more, five or more, or six or more nucleic acids that hybridize to different portions of chromosomal regions 1-19, each of the different portions comprising one or more variations (or at least a part of a variation) found in chromosomal regions 1-19. Also provided for herein are compositions comprising two or more, three or more, four or more, five or more, or six or more nucleic acids that hybridize to different portions of chromosomal regions 1-19.
Also described herein are pharmaceutical compositions comprising one or more gene products, active portions thereof, or variants thereof for use in the treatment of a pulmonary disease. Also provided herein are methods of using one more nucleic acid molecules encoding one or more of the gene products, an active portion(s) thereof, or variant(s) thereof for use in the treatment of pulmonary diseases such as COPD. In some embodiments, the one or more gene(s) encoding the one or more gene products are selected from the group including CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, MYO5B, ENPP6, KBTBD9, MSRB3, and TSC2.
Compositions are provided comprising two or more pairs of nucleic acid molecules that may function, for instance, as primers sets for the amplification of various portions of chromosomal regions 1-19. In such embodiments, the two or more pairs of nucleic acid molecules comprise a first pair of nucleic acid molecules and a second pair of nucleic acid molecules. The first pair of nucleic acid molecules comprises (i) a first nucleic acid molecule comprising a nucleotide sequence complementary to a portion of a chromosomal region selected from chromosomal regions 1-19 and (ii) a second nucleic acid molecule comprising a nucleotide sequence complementary to the opposite strand of the chromosomal region to which said first nucleic acid is complementary. The second pair of nucleic acid molecules comprises (iii) a third nucleic acid molecule comprising a nucleotide sequence complementary to a portion of a chromosomal region selected from chromosomal regions 1-19 and (iv) a fourth nucleic acid molecule comprising a nucleotide sequence complementary to the opposite strand of the chromosomal region to which said third nucleic acid is complementary.
Also described herein are pharmaceutical compositions comprising one or more gene products, active portions thereof, or variants thereof for use in the treatment of a pulmonary disease. The genes encoding the one or more gene products can be selected from the group consisting of genes listed in Tables 5b, 6 and
The techniques provided herein permit the use of genetic variations, such as the SNPs identified as described herein, both singly or in combination with other variations in linkage disequilibrium (LD) with those SNPs, for the diagnosis, prediction of clinical course (prognosis), and/or assessment of treatment effect/patient response for pulmonary disease such as COPD. Additional uses include development of new treatments for pulmonary disease such as COPD, based upon comparison of the variant and normal versions of the gene or gene product, and development of cell culture-based and animal models for research and treatment of pulmonary disease such as COPD.
Another embodiment of the present technology provides a method of detecting a predisposition to, a diagnosis of, a prognosis of, the severity of, or the response to treatment for a pulmonary disease (e.g., COPD) in a mammal, comprising assaying the product of at least one gene selected from the group consisting of CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, MYO5B, ENPP6, KBTBD9, MSRB3, and TSC2.
Assaying a gene may be conducted by determining the expression of a nucleic acid product (e.g., an mRNA) produced by the gene. Where nucleic acid levels are to be determined, a variety of techniques including quantitative PCR, Southern blotting or Northern blotting may be employed. Alternatively, assaying a gene may be conducted either by assessing the level of the protein produced, or by examining the biological activity of the protein product. The level of protein present in a sample may be determined by methods including, but not limited to, immunological methods (e.g., ELISA or Western blot) and also by the activity of the protein in either biological or enzymatic assays. As SNPs within protein coding sequences may affect the biological activity or stability of proteins due to alterations in the protein sequence, assaying a combination of protein level and its biological activity, or the level of gene expression (e.g., mRNA production) and the protein's biological activity may be desirable when assaying a gene product involves assaying a protein.
In some embodiments, a method of predicting a predisposition to, a diagnosis of, a prognosis of, the severity of, or the response to treatment for a pulmonary disease in an individual (a subject) involves obtaining a sample from the individual, wherein the biological sample contains, or is expected to contain, all or a portion of the gene product of the genes listed in Tables 5b, 6 and/or
In one embodiment, the present disclosure provides nucleic acid molecules that can be inserted in an expression vector to produce a variant protein in a host cell. Thus, the present disclosure provides for vectors comprising a SNP-containing nucleic acid molecule(s) that can be functionally linked to a promoter, genetically engineered host cells containing the vector, and methods for expressing a recombinant variant protein including the use of host cells containing such vectors. The host cells, SNP-containing nucleic acid molecules and/or variant proteins can also be used as targets in a method for screening and identifying therapeutic agents or pharmaceutical compounds useful in the treatment of pulmonary disease and related pathologies.
Also provided herein are methods of using one or more nucleic acid molecules encoding one or more of the gene products, an active portion(s) thereof, or variant(s) thereof, for use in the treatment of pulmonary diseases such as COPD. In some embodiments, the one or more genes encoding the one or more gene products are selected from the group including CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, MYO5B, ENPP6, KBTBD9, MSRB3, and TSC2.
Another aspect of the technology described herein is kits, which can be used, for example, in performing one or more of the methods described herein. One embodiment provides for a kit comprising one or more nucleic acid probes, wherein the probes allow the identification of either a nucleic acid having a nucleotide sequence of a SNP associated with pulmonary disease (e.g., COPD) found in one of the nineteen chromosomal regions provided herein (see Tables 5a, 5b, 7, 8 and/or in
In some embodiments, the kit comprises a means suitable for identifying an amino acid sequence selected from the group consisting of amino acid sequences encoded by nucleic acids bearing a variation in LD with a SNP listed in Tables 5a, 5b, 7, 8 and/or in
In some embodiments of the kits provided herein, the control is an assay standard, such as a sample of the protein being assayed (e.g., a protein produced by a gene associated with an SNP such as CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, MYO5B, ENPP6, KBTBD9, MSRB3, and TSC2) or a nucleic acid (e.g., DNA or RNA) bearing one of the SNPs listed in Tables 5a, 5b, 7, 8 and/or in
In some embodiments, the kits provided herein comprise one or more chips or high-density arrays that contain many individual regions bearing a binding partner, such as a nucleic acid, for determining the presence or measuring the quantity of nucleic acid molecules present in a sample. Where assays are conducted using arrays of nucleic acids as molecular probes, the array can comprise a SNP listed in Tables 5a, 5b, 7, 8 and/or in
Other embodiments are directed to devices. In one embodiment, the device comprises a test surface having a plurality of locations, wherein one or more of said locations comprise an antibody that binds to the product of a gene associated with a SNP listed in Tables 5a, 5b, 7, and 8 and/or in
The various embodiments described herein can be complementary and can be combined or used together in a manner understood by the skilled person in view of the teachings contained herein.
As demonstrated herein, analysis of polymorphisms in the genes and regions identified herein leads to an ability to identify subjects that may have a predisposition to, or heightened risk of, developing a pulmonary disease, and to predict whether the subject may benefit from monitoring, prophylactic treatment, and/or treatment. Analysis of polymorphisms in the genes and regions identified herein also leads to an ability to diagnose a pulmonary disease, to predict the development of a pulmonary disease, to determine the probability of its development, and to predict its ultimate severity. Such predictions may be made based upon an analysis either of the polymorphisms alone, or in conjunction with other clinically relevant information, such as continued smoke exposure, or the presence of biochemical markers, such as nitrite levels, catalase activity and lipid peroxidation in plasma of an individual. See e.g., U.S. Application 20060177830. The SNPs disclosed herein may contribute to pulmonary disease and related pathologies in an individual in a variety of ways. Some SNPs occur within a protein coding sequence and thus, may directly contribute to disease phenotype. Other polymorphisms may occur in noncoding regions but may exert phenotypic effects indirectly, such as, for example, by influencing replication, transcription, translation, or other regulation of a gene. An individual SNP may also affect more than one phenotypic trait. Alternatively, a single phenotypic trait may be affected by multiple SNPs in the same or different genes.
COPD is predicted to become the third leading cause of death worldwide by 2020 (Mannino & Braman 2007), and cigarette smoking is widely recognized as its primary environmental causative factor. The pulmonary component of COPD is primarily characterized by airway inflammation with incompletely reversible, usually progressive, airflow obstruction (Rabe et al. 2007, Am J Respir. Crit Care Med., vol. 176, no. 6, pp. 532-555; Barnes et al. 2003, Eur Respir J, 22:672-688; Barnes 2003, Annu Rev Med 54:113-129). The identified pathophysiologic mechanisms of COPD include an imbalance between protease and anti-protease activity in the lung, dysregulation of anti-oxidant activity and chronic abnormal inflammatory response to long-term exposure to noxious gases or particles leading to the destruction of the lung alveoli and connective tissue (Rabe et al. 2007, Barnes et al. 2003, Barnes 2003). However, COPD may be best characterized as a syndrome associated with significant systemic effects that are attributed to low-grade, chronic systemic inflammation (Agusti et al. 2003, Euro. Resp. J. 21.2: 347-60; Rahman et al. 1996, Amer. J. of Resp. and Crit. Care Med. 154.4 Pt I (1996): 1055-60; Agusti & Soriano 2008, J. of Chronic Obstructive Pulmonary Disease 5: 133-38; Fabbri & Rabe 2007, Lancet, 370 (2007): 797-99). Although spirometric parameters are the traditional gold standard diagnostic and prognostic markers for COPD, it has become clear that they do not adequately represent all of its respiratory and systemic aspects (Marin et al. 2009, Respir Med 103:373-8; Celli 2006, Proceedings of the Amer. Thoracic Society 3:461-465). FEV1 correlates poorly with the degree of dyspnea, and the change in FEV1 does not reflect the rate of decline in health status (Celli et al. 2004, The New England J. of Med. 350:1005-1012; Celli 2006; Burge et al. 2000, British Medical J. 320:1297-1303). Other factors, such as emphysema and hyperinflation (Casanova et al. 2005, Amer. J. of Resp. and Crit. Care Med. 171:591-597), malnutrition (Schols et al. 1998, Amer. J. of Resp. and Crit. Care Med. 157:1791-1797), peripheral muscle dysfunction (Maltais et al. 2000, Clinics in Chest Med. 21:665-677), and dyspnea (Nishimura et al. 2002, Chest 121:1434-1440), are independent predictors of outcome. In fact, the multifactorial BODE index that includes body mass index (B), degree of airflow obstruction (O), dyspnea score (D), and exercise endurance (E), was a better predictor of mortality than FEV1 alone (Celli et al. 2004). The PBMC gene expression profile alone or in combination with clinical markers such as the BODE index components and/or lung parenchymal or airway changes on chest CT scans (Omori et al. 2006, Respirology 11:205-210) may be more predictive of the (early) presence, activity, and progression of the multi-component syndrome that is COPD compared to the clinical parameters alone.
The incompletely reversible airflow limitation observed in COPD results from small airway disease (obstructive bronchiolitis) and parenchymal destruction (emphysema). These pathologic changes are the result of an abnormal inflammatory response to long-term exposure to noxious gases or particles, with structural changes due to repeated injury and repair (Rabe et al. 2007). The mechanisms of the enhanced inflammation that characterizes COPD involve both innate and adaptive immunity in response initially to inhalation of particles and gases (MacNee 2001, Euro. J. of Pharmacology, vol. 429, pp. 195-207). Several studies have demonstrated differences in markers of inflammation and immune response, such as a correlation between the number of CD8 cytotoxic T lymphocytes and the degree of airflow limitation in COPD (Curtis, et al. 2007, Proc. of the Amer. Thoracic Soc., vol. 4, no. 7, pp. 512-521). The response to oxidative stress is considered an important factor in the pathogenesis of COPD (MacNee 2005, Proc. of the Amer. Thoracic Soc., vol. 2, no. 1, pp. 50-60), while protease-antiprotease imbalance is thought to be associated with emphysema (Baraldo et al. 2007, Chest, vol. 132, no. 6, pp. 1733-1740). However, while inflammation and other factors are clearly involved in the molecular pathogenesis of COPD, the precise etiological mechanisms remain to be fully characterized.
Novel genetic associations with lung functions that decline as a function of increasing cigarette smoking, after controlling for the effects of age and baseline lung function, are provided herein. As described herein, a genome-wide association study (GWAS) investigation of COPD was performed. Over 550,000 genetic markers were genotyped and tested for association in a sample of 192 adult cigarette smokers with COPD who were followed longitudinally over 17 years and in 197 age- and gender-matched control subjects (smokers and never-smokers without COPD). The outcomes for the association analyses were four spirometry-based indices that deconvoluted the major biological processes driving lung function decline, as well as the conventional dichotomous case-control categorization. The four spirometry-based outcome variables were calculated as best linear unbiased predictors (BLUPs) of lung function decline and focused on age-related decline (Age decline), pack-years-related decline (Pack-years decline), the intensifying effects of smoking, in terms of number of cigarettes per day (CPD), on decline with age (CPD×Age decline), and Baseline lung function (BL).
The results from the GWAS were examined in two contexts. In one context, results were examined to identify chromosomal regions where variations in the nucleotide sequence (e.g., the introduction of SNPs, deletions, insertions, etc.) were found to be associated with a decline in lung function. Second, the results were examined in the context of genes associated with the identified chromosome regions to identify biological/biochemical pathways whose impairment may be associated with lung disease and which are predictive of a predisposition to or the presence of pulmonary diseases like COPD. Such pathways may be identified by the presence of one or more genes in the identified chromosomal regions associated with recognized biological/biochemical pathways. Once identified, the pathways may be of further use in defining methods of diagnosis, prognosis, severity prediction, and treatment of pulmonary disease such as COPD.
The present disclosure identifies nineteen chromosomal regions having significant associations with pulmonary disease such as COPD. Those regions include one or more genes and identified polymorphisms (e.g., SNPs). As described below, some of the chromosomal regions include SNPs that are in, or that are near, genes that function in biological processes such as cilia function/lung clearance, neutrophil activation, and complement regulation. The genes, intragenic regions, and SNPs associated with COPD found in each of the nineteen chromosomal regions provided herein are listed in Tables 5a, 5b, 7, 8 and/or in
Based on the identification of those chromosomal regions, the present disclosure provides methods of detecting a predisposition to, a diagnosis of, a prognosis of, the severity of, or the response to treatment for a pulmonary disease (e.g., COPD), in a subject. In one embodiment, the methods comprise identifying in a subject's chromosomes one or more variations in a nucleotide sequence of one or more of the nineteen chromosomal regions identified herein. Variations in those nucleotide sequences can be correlated with a predisposition to, a diagnosis of, a prognosis of, the severity of, or the response to treatment for a pulmonary disease in a subject.
Biological processes identified as over-represented in the set of lung disease (e.g., COPD) predictor genes present in the nineteen identified chromosomal regions include: regulation of apoptosis, regulation of cell growth, macromolecule (protein and RNA) transport, post-translational protein modification, cellular defense response, inflammatory response and RNA processing. Major pathways identified include apoptosis, p38/MAPK signaling, focal adhesion, and leukocyte transendothelial migration. Changes in these biological processes and pathways may reflect the changes in activation, differentiation and cellular composition of the samples analyzed. The identification of leukocyte transendothelial migration seems to be an important change in this cell population due to the fact that COPD is characterized by leukocyte infiltration in the lung parenchyma (Panina et al. 2006). It is possible that differences in expression of these genes may result in a predisposition of leukocyte subpopulations to infiltrate the lung tissue, and perhaps other tissues. This observation is supported by previously reported changes in chemotaxis and extracellular proteolysis in neutrophils isolated from the blood of subjects with COPD (Burnett et al. 1987).
2.1 Variations and their Identification.
As used herein “variations” in a nucleotide sequence refer to differences in a nucleotide sequence in an individual relative to the sequence of nucleic acid molecules appearing in a control sequence (e.g., the sequence of chromosomal DNA for dominant allele or of a control subject) or in the larger population (e.g., the difference(s) in the sequences of chromosomal DNA giving rise to different alleles in a population of control subjects). Variations include, but are not limited to: SNPs; deletions; insertions (e.g., di-, tri-, or tetra-nucleotide repeats); variable number tandem repeats (VNTR); short tandem repeat/microsatellites; copy number variants; amplifications (e.g., duplications); translocations; transversion (the substitution of a purine for a pyrimidine); and transitions (exchanging of purines or pyrimidines present in a sequence i.e., exchanging purines A H G, or pyrimidines C A/T). The sequences at any given chromosomal location, including the prevalence of any particular base at any location may be established by any means known in the art including accessing databases (e.g., human genomic databases at the NCBI)
Variations in the nucleotide sequences found in a subject's genome (e.g., the nineteen chromosomal regions described herein) can be identified by analysis of the chromosomal material or copies of that material (e.g., PCR amplified copies of one or more portions of a subjects chromosomal DNA) using any method known in the art, including but not limited to those described below.
As used herein, a Single Nucleotide Polymorphism (SNP) is a specific position within the reference human genome that may vary between the four possible nucleotides between individuals. The different possible nucleotides are referred to as alleles.
In addition to the analysis of chromosomal material for the identification of variations in the nucleotide sequence of chromosomal regions, gene products expressed by genes located in the chromosomal regions can be analyzed (e.g. mRNA or cDNA copies thereof). It is also possible to examine proteins and polypeptides produced by genes within the chromosomal regions to identify variations in the nucleotide sequence of the chromosomal region.
Protein or nucleic acid sequence identifiers provided herein uniquely identify nucleic acid and/or protein sequence(s), (e.g., an NCBI accession number/version and/or NCBI “GI” Number). Those identifiers and the coinciding sequence(s) are publicly available, for example, at the United States National Center for Biotechnology Information (NCBI, U.S. National Library of Medicine, 800 Rockville Pike, Bethesda, Md., 20894 USA) or on the world wide web at www.ncbi.nlm.nih.gov. Where an NCBI accession number or GI number is provided for only one or two of the chromosomal sequence(s), protein sequence(s) or a nucleic acid sequence(s) encoding a protein produced by a gene indicated herein (e.g., a cDNA sequence), the sequence(s) for those nucleic acids and/or proteins not provided are also available in the NCBI database and considered part of this disclosure. Where any accession number does not recite a specific version, the version is taken to be the most recent version of the sequence associated with that accession number at the time the earliest priority document for the present application was filed.
2.2 Analysis of Nucleic Acids to Identify Variations in Chromosomal Regions
Any Method Known in the Art May be Used to Identify Variations in the Nucleotide Sequence of a subject's chromosomal DNA: including, but not limited to: sequencing, single stranded cleavage, hybridization (such as to arrays or individual nucleic acid probes), differential hybridization between the variant and a wild type sequence, single base extension, allele specific cleavage by restriction enzymes, oligonucleotide ligation assay (OLA), mass spectroscopy, and Polymerase Chain Reaction (PCR) based methods, such as amplification with allele specific primers. Nucleic acid probes used in any of those methods may be detectably labeled, such as with radioisotopes or fluorescent tags.
As used herein, a “primer” or “probe” is a nucleic acid molecule that typically comprises at least about 8, 10, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides complementary to the nucleic acid sequence it is targeted against (e.g., a portion of chromosomal regions 1-19). Primers and probes may also contain nucleotide sequences in addition to the region complementary to the target sequence meaning their total length may be significantly longer than the region complementary to the target sequence. Depending on the type of assay in which it is employed, the complementary region of a probe will generally be less than 40, 50, 60, 65, 75, 100, 150, 200, or 250 nucleotides in length; however, the complementary portion of a probe may be as long as the target sequence to be detected. Primers, which are to be extended by the action of a polymerase, such as primers for nucleic acid amplification, typically comprise more than about 12 or 15 and less than about 30 nucleotides complementary to the target sequence. Like probes, primers can contain sequences in addition to the portion complementary to the target sequence, and thus may be longer than the 30 nucleotides. In some embodiments, primers or probes comprise regions complementary to the target sequence that is in a range selected from: about 16 to about 32 nucleotides, about 18 to about 28, and about 18 to about 26 nucleotides. In other embodiments, such as where probes are affixed to a substrate in a nucleic acid array, the probes can be longer, such as about 30 to about 60, 50 to about 75, 70 to about 90, or about 100 or more nucleotides in length. In still other embodiments, primers can be as long as the length of the target sequence minus one nucleotide.
A number of considerations must be taken into account when designing probes and primers including, but not limited to, the length of the primer or probe, a GC content within a range suitable for hybridization, a lack of predicted secondary structure, and the stringency of the conditions under which the hybridization between the probe or primer and the target sequence is to be performed. A skilled artisan will recognize that other factors, including the nature of the sequences surrounding a variation where a probe or primer may need to hybridize, must also be taken into consideration.
Where hybridization is used, a nucleic acid probe typically hybridizes to a target nucleic acid containing the sequence variation (e.g., SNP) by complementary base-pairing in a sequence specific manner, and discriminates the target variant sequence from other nucleic acid sequences.
In one aspect, one or more probes are employed that can differentiate between nucleic acids having a specific variation (e.g., a specific allele such as SNP) and the wild type sequence at the location of the specific variation. In an embodiment, the specific variations are selected from two or more of the SNPs recited in
Variations may also be detected employing a nucleic acid amplification primer (e.g., a PCR primer) that acts as an initiation point for nucleotide extension at the point of or in the variation, so that amplification will only be effective where the primer matches the variant sequence (or wild type for the control).
Where variations in nucleic acid sequences are identified using allele specific primers or probes, the design of each allele-specific primer or probe depends on variables such as the precise composition of the nucleotide sequences flanking the variation, the length of the primer or probe, a GC content within a range suitable for hybridization, lack of predicted secondary structure and the stringency of the condition under which the hybridization between the probe or primer and the target sequence is performed.
Higher stringency conditions utilize buffers with lower ionic strength and/or a higher reaction temperature. Lower stringency conditions utilize buffers with higher ionic strength and/or a lower reaction temperature. By way of example, and not limitation, one set of conditions for high stringency hybridization of allele-specific probe is: prehybridized with a solution containing 5× standard saline phosphate EDTA (5×SSPE, 50 mM NaH2PO4, pH 7.7, containing 0.9 M NaCl and 5 mM EDTA), 0.5% SDS) at 55° C. followed by incubation with the probe under the same conditions, followed by washing with a solution containing 2×SSPE, and 0.1% SDS at 55° C. or room temperature (about 18-24° C.).
Moderate stringency hybridization conditions (e.g., for allele-specific primer extension reactions) may utilize a solution containing about 50 mM KCl at about 46° C. Alternatively, the incubation may be conducted at an elevated temperature, such as 60° C. In another embodiment, a moderately stringent hybridization condition suitable for oligonucleotide ligation assay (OLA) reactions, wherein two probes are ligated if they are completely complementary to the target sequence, may utilize a solution of about 100 mM KCl at a temperature of 46° C.
In hybridization-based assays, allele-specific probes can be designed that hybridize to a segment of target DNA having a wild-type sequence or the sequence of a variation (e.g., alternative SNP alleles/nucleotides). Hybridization conditions should be sufficiently stringent that there is a significant detectable difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles or significantly more strongly to one allele. While a probe may be designed to hybridize to a target sequence that contains a SNP so that the SNP site aligns anywhere along the sequence of the probe, the probe is preferably designed to hybridize to a segment of the target sequence such that the location of the SNP aligns with a central portion of the probe (e.g., a position within the probe that is at least three nucleotides from either end of the probe). Such a probe design generally achieves good discrimination in hybridization between different allelic forms.
In an embodiment, a probe or primer may be designed to hybridize to a segment of target DNA such that the variation aligns with either the 5′ most end or the 3′ most end of the probe or primer. In an embodiment which is particularly suitable for use in an oligonucleotide ligation assay (see e.g., U.S. Pat. No. 4,988,617), the 3′ most nucleotide of the probe aligns with the SNP position in the target sequence.
Synthetic nucleic acids (e.g., Peptide Nucleic Acids, PNA) may also be used to detect variation in a nucleic acid sequence. In one embodiment, a variation such as a SNP is detected with a reagent such as a PNA oligomer, or a combination of DNA, RNA and/or a PNA, that hybridizes to a segment of a target nucleic acid molecule containing a sequence variation. In an embodiment, those variations are the SNPs identified in Table 5a, 5b, 7, 8 and/or
In an embodiment, multiple detection reagents, such as probes and/or primers, may be prepared and/or employed in one or more formats. For example, multiple detection reagents may be affixed to a solid support (e.g., arrays or beads) or supplied in solution (e.g., probe/primer sets for PCR, RT-PCR, TaqMan assays, OLA assays, or primer-extension reactions). Multiple probes or primers (e.g., about 2, 3, 4, 5, 6, 8, 9, 10 or more probes and/or primers) in any of those formats may be prepared in the form of kits, which optionally contain instructions on their use in detecting sequence variations.
Those skilled in the art will understand that nucleic acid molecules may be double-stranded molecules and that reference to a particular site on one strand refers, as well, to the corresponding site on a complementary strand. In defining the position of a variation such as a SNP, a reference to an adenine, a thymine (uridine), a cytosine, or a guanine at a particular site on one strand of a nucleic acid molecule also defines the thymine (uridine), adenine, guanine, or cytosine (respectively) at the corresponding site on a complementary strand of nucleic acid molecule. Probes and primers may be designed to hybridize to either strand and the genotyping methods disclosed herein may generally target either strand. Primers may be designed to amplify any of chromosomal regions 1-19 identified herein or parts thereof.
2.3 Analysis of Polypeptides and/or Proteins to Identify Variations in Chromosomal Regions
Variations in the nucleotide sequence of one or more of a subject's chromosomal regions can be identified by examining the protein or polypeptide gene products encoded by the chromosomal regions. In one embodiment, variant polypeptides or variant proteins that differ from the “wild type” proteins encoded by the genes of the nineteen chromosomal regions associated with COPD and other lung disease may be used to identify the presence of variations in the nucleotide sequence of a subject's chromosomal DNA. Variant polypeptides and proteins include, but are not limited to, proteins or polypeptides having: a single or multiple amino acid difference, truncations, additions, insertions, or deletions, arising from the variations in the nucleotide sequences encoding them relative to the wild type polypeptide/protein (e.g., SNPs may introduce missense mutations, nonsense mutations, or read-through mutations that remove a stop codon). For the purpose of this disclosure the wild type proteins/polypeptides are considered to be the polypeptides and proteins encoded by the sequences of the nineteen chromosomal regions identified in this disclosure. Where variations in a subject's chromosomal DNA do not arise in the sequences encoding gene products, the variations may still alter the level of expression of the polypeptide or protein encoded by the gene.
In an embodiment, the variant polypeptides or proteins are selected from the proteins CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, ENPP6, KBTBD9, MSRB3, MYO5B, ENPP6 and TSC2. In another embodiment, the variant polypeptides or proteins are selected from CSMD1, MYO5B, and DNAH3. In another embodiment, the variant polypeptides or proteins are selected from CLEC4A, EBF2, ELMO1, and TSC2.
Alterations in polypeptides or proteins (including CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, ENPP6, KBTBD9, MSRB3, MYO5B, ENPP6 and TSC2) may be identified by any means known in the art, including but not limited to: antibodies specific to changes in the amino acid sequence caused by a variation, the size of the polypeptides/proteins observed (e.g., where insertions, deletions, non-sense or read through mutations have occurred), and mass spectroscopy of the polypeptides/proteins or fragments thereof (e.g., tryptic digests). In addition to the foregoing, where variations in nucleotide sequences alter a biochemical activity (e.g., enzymatic activity or binding to ligand), assays of the activity may be used to assess the presence of variations in the nucleotide sequence of a chromosomal region.
Where the level of polypeptide/protein expression is altered in a subject, changes in the level of expression may be identified in any suitable assay including, but not limited to immunoassays or biochemical assays such as enzymatic assays. In an embodiment, activity assays of ENPP6 or MSRB3 are used to identify variations in the nucleotide sequence encoding those proteins.
It is possible to provide an estimate of a subject's predisposition to, diagnosis of, or prognosis (e.g., expected severity) of pulmonary disease (e.g., COPD) by identifying variations in the nucleotide sequence of one or more of the nineteen chromosomal regions identified herein. As described herein, variations in those chromosomal regions, including specific SNPs described in any of Tables 5a, 5b, 7 and/or 8, can be associated with an increased risk of having or developing pulmonary disease and related pathologies. Thus, where certain sequence variations (e.g., SNPs) can be identified in a subject's chromosomal DNA, they may be employed to determine whether an individual possesses an increased risk of developing pulmonary disease such as COPD or a related disorder (i.e., they have a predisposition to pulmonary disease). The presence of those sequence variations can also be used in the diagnosis of lung disease, such as COPD, or to provide a prognosis for the COPD.
In one embodiment, a method of detecting/determining a predisposition to, a diagnosis of, a prognosis of, the severity of, or the response to treatment for a pulmonary disease (e.g., COPD) in a subject comprises identifying variations in the nucleotide sequence of one or more chromosomal regions selected from regions 1-19 of said subject, where the presence of one or more variations in said chromosomal regions are indicative of a predisposition to, or the presence of, COPD in the subject.
Variations in chromosomal regions may be the variations identified in Tables 5a, 5b, 7, 8 and/or in
In one embodiment, chromosomal variations that are associated with pulmonary diseases at a statistically significant level include those variations found within any of regions 1-19 and those within 2,500 base pairs of any SNP within those regions identified as having a statistically significant association with a pulmonary disease described herein. In another embodiment, chromosomal variations that are associated with pulmonary diseases at a statistically significant level include those variations found within any of regions 1-19, and those statistically significant variations within a distance that is equal to 10% of the length (as measured in base pairs) of the individual chromosomal regions. In either case, statistically significant associations may be shown where the variations have a q-value of less than 0.5 or a p-value of 0.05, 0.02, 0.01, 0.005 or less (depending on the stringency desired) for their association with lung function or its decline (e.g., % predicted FV1, % predicted FVC, or the ratio of FEV1/FVC).
Unless stated otherwise, the terms “diagnose”, “diagnosing”, “diagnosis”, and “diagnostics” used herein include, but are not limited to, any of the following: detection of pulmonary disease and/or a related pathology that a subject may presently have; determining a particular type or subclass of pulmonary disease in a subject known to have pulmonary disease; confirming or reinforcing a previously made diagnosis of pulmonary disease; pharmacogenomic evaluation of a subject to determine which therapeutic strategy the subject is most likely to positively respond to or to predict whether a patient is likely to respond to a particular treatment; predicting whether a patient is likely to experience negative effects from a particular treatment or therapeutic compound; and evaluating the future prognosis of an individual having a pulmonary disease. Such diagnostic uses can be based on the SNPs individually or a unique combination of SNPs. In addition to use as diagnostics the SNPs, individually or as a combination of SNPs, may also be used to stratify enrollment in clinical research trials of therapeutics or prophylaxis/treatment modalities to enrich for a response with a smaller sample size (i.e., smaller number of subjects).
In one embodiment, an individual or a population of individuals may be considered as not having pulmonary disease (lung disease) or impaired lung function when they do not exhibit clinically relevant signs, symptoms, and/or measures of lung disease. Thus, in various aspects, an individual or a population of individuals may be considered as not having pulmonary disease (e.g., chronic obstructive pulmonary disease, chronic systemic inflammation, atherosclerosis, emphysema, asthma, pulmonary fibrosis, cystic fibrosis, lupus, obstructive lung disease, pulmonary inflammatory disorder, lung cancer or other diseases having pulmonary manifestations) when they do not manifest clinically relevant signs, symptoms and/or measures of those disorders. In another embodiment, an individual or a population of individuals may be considered as not having lung disease or impaired lung function, such as COPD, when they have a FEV1/FVC ratio (also known as FEV1/FVC ratio or FEV/FVC ratio) greater than or equal to about 0.70 or 0.72 or 0.75. In another embodiment, an individual or population of individuals that may be considered as not having lung disease or impaired lung function are sex- and age-matched with test subjects (e.g., age matched to 5 or 10 year bands) that are current or former cigarette smokers or never-smokers without apparent lung disease who have an FEV1/FVC≧0.70 or ≧0.75. Individuals or populations of individuals without lung disease or impaired lung function may be employed to establish the normal range of sequence variations (e.g., allele patterns and allele frequencies in “control subjects”) proteins, peptides or gene expression. Individuals or populations of individuals without lung disease or impaired lung function may also provide samples against which to compare one or more samples taken from a subject (e.g., samples taken at one or more different first and second times) whose lung disease or lung function status may be unknown. In other embodiments, an individual or a population of individuals may be considered as having lung disease or impaired lung function when they do not meet the criteria of one or more of the above mentioned embodiments.
In one embodiment, control subjects, as that term is used herein are sex- and age-matched current or former cigarette smokers or never-smokers, without apparent lung disease who have FEV1/FVC≧0.70. Age matching may be conducted in bands of several years, including 5, 10 or 15 year bands. Control subjects are preferably recruited from the same clinical settings. A control group is more than one, and preferably a statistically significant number of control subjects. In one embodiment, control subjects are sex- and age-matched (in 10 year bands) current or former cigarette smokers, without apparent lung disease who had FEV1/FVC≧0.70.
In one embodiment, a control sample is a sample from one or more control subjects or which provides a result representative of tests conducted on a control group. In another embodiment, a control sample is a sample from a subject without lung disease (e.g., COPD) or which provides a result representative of tests conducted on a subjects without lung disease. In another embodiment a control sample is a sample containing a known amount (e.g., in mass, number of moles, or concentration) of one or more nucleic acids and/or proteins.
In an embodiment the methods of detecting a predisposition to, a diagnosis of, a prognosis of, the response to treatment for a pulmonary disease, or predicting/determining the severity of a pulmonary disease (e.g., COPD) employ at least one, two, three, four, five, six, seven, eight, nine, ten, fifteen, or twenty sequence variations found in the nineteen chromosomal regions. In another embodiment, the methods of detecting a predisposition to, diagnosis of, or prognosis of lung disease, such as COPD, employ at least one, two, three, four, five, ten, fifteen, twenty, twenty five, or thirty of the SNPs in Tables 5a, 5b, 7, 8 and/or in
Assessing a number of different variations present in the nineteen chromosomal regions (e.g., the alleles from a collection of single polymorphisms) allows increased statistical confidence that the variations (e.g., SNPs) observed are indicative of the likelihood that an individual will develop pulmonary disease (e.g., COPD), can be diagnosed with pulmonary disease, or can be provided with a prognosis of the future severity of pulmonary disease. In other words, employing multiple variations in the analysis of a single subject provides increased reliability in the risk profiling of that subject. More broadly, this is analogous to the situation of an individual having only one risk factor predisposing to atherosclerosis (elevated cholesterol) vs. multiple risk factors (elevated cholesterol plus hypertension, obesity, smoking, diabetes, etc.). Risk is increased as the number of risk factors increases. Moreover, where an individual is already experiencing clinical manifestations (symptoms) of pulmonary disease, and particularly COPD, by assaying variations in nucleotide sequences in the nineteen chromosomal regions (e.g., the polymorphisms provided herein) it is possible to provide a prognosis based upon the predicted risk of developing pulmonary disease (e.g., COPD).
By assaying the polymorphisms as provided herein, it is possible to predict the risk of developing pulmonary disease (e.g., COPD) prior to its clinical detection. Such early prediction provides the clinician with opportunities to prevent the manifestation of, slow, or halt the progression of the disease.
The skilled artisan will recognize that, due to the heterogeneous nature of pulmonary diseases such as COPD, not all individuals with pulmonary disease will possess alleles for any or all of the sequence variations described herein, (e.g., SNPs listed in Tables 5a, 5b, 7 and/or 8). In some embodiments of the methods provided herein, the presence of at least three alleles, selected from the SNPs and genes shown in Tables 5a, 5b, 7, 8 and/or in
Where it is desirable, sequence variations within the nineteen chromosomal regions identified, and all other sources of variation in associated regions, may be used to calculate a measure quantifying the risk of developing a disease (COPD), diagnosing it, or predicting its progression or severity. This calculation is conducted by an algorithm where the individual variations identified in a subject are used alone or in combination in the calculation. The result would quantify risk as an Odds Ratio (OR) or a Predictive Probability (PP). Further, the calculation of such a combined outcome could include other non-genetic variables including, but limited to, demographics, exposure, and biomarkers such as age, ancestry, cumulative exposure to cigarette smoke, spirometric measures of lung function, presence of symptoms such as, but not limited to, dyspnea, measure of exercise capacity, gene expression level, protein abundance, metabolite levels, or methylation status. A combination of multiple variables, including those yet to be identified will increase the accuracy of the assessment.
The linkage (association) of variations in different portions of the nineteen chromosomal regions (e.g., genes) described herein with the development of pulmonary diseases such as COPD and their progress, indicates that different polymorphisms may play a role in the development of pulmonary diseases in different subjects. As variations at different polymorphic sites will occur in different subjects, the associations between various genetic sites provided herein make possible the identification of subject profiles (e.g., profiling of patients). Such subject profiles make possible individualized treatments, which are desirable as regimes effective to treat a first patient with a first profile may not be as effective in a second patient with a different second profile. Subject specific profiles also allow less effective (or ineffective) treatments, particularly those accompanied by undesirable side effects, to be avoided.
In view of the correlation between the etiology of COPD and genes associated with identified sequence variations (e.g., SNPs) within identified chromosomal regions, the ability to manipulate the expression of those genes represents an efficacious means to treat pulmonary disease such as COPD. Methods to treat a pulmonary disease may include gene therapy to increase or decrease the expression of the level or activity of one or more of the gene products produced by the genes found in chromosomal regions identified herein. Treatment may also include methods in addition to, or as an alternative to, gene therapy to increase or decrease the expression or activity of one or more products of the genes found in the chromosomal regions identified herein.
The products of genes in the nineteen chromosomal regions identified herein are not limited to nucleic acids. Identification of genes involved in the development of pulmonary diseases such as COPD also makes possible an identification of proteins that may affect the development of a pulmonary disease. Identification of such proteins makes possible the use of methods to affect their expression, processing, abundance, function, biological activity, or to alter their metabolism. Methods to alter the effect of expressed proteins include, but are not limited to, the use of specific antibodies or antibody fragments that bind the identified proteins, specific receptors that bind the identified proteins, or other ligands or small molecules that inhibit the identified proteins from affecting their physiological target and exerting their metabolic and biologic effects. In addition, those proteins that are down-regulated or are affected by mutations reducing their activity may be exogenously supplemented to ameliorate the effects of their decreased activity or synthesis, or increased degradation. The identification of genes involved in the development of pulmonary diseases also makes possible prophylactic methods to affect gene expression or protein function that may be used to treat individuals at risk for the development of a pulmonary disease, or to prevent the clinical manifestation of a pulmonary disease in individuals at risk for its development.
4.1 Methods of Enhancing Gene Expression
Where a subject has decreased activity of one or more gene products relative to the levels found in individuals expressing the wild type gene, it is possible to treat pulmonary diseases such as COPD by enhancing expression of one or more of those genes. Gene transcription may be deliberately modified in a number of ways to enhance the activity of the gene products in a subject. In one embodiment, exogenous copies of a gene are inserted into the genome of cells (e.g., a subject's cells) via homologous recombination in vivo or in vitro. In other embodiments, gene products may be expressed in cells by the introduction of a vector that remains extrachromosomal (e.g., a plasmid or a viral vector such as modified adenovirus), thereby allowing for transcription and expression independent of the genomic allele. Yet another method is transfection with naked DNA. In some embodiments, a promoter specific to the vector, rather than a copy of the wild type promoter, is used to drive expression of the gene product from the vector.
Where the genes are inserted into cells in vitro, the resulting cells can be introduced into a subject. Transient expression from introduced vectors generally have high expression levels; however, the gene/vector is maintained for a short period of time, particularly without selection, although use of an episomal vector containing a eukaryotic origin of transcription provides for greater persistence of the vector.
4.2 Methods of Inhibiting Gene Expression
Where a subject has increased activity of one or more gene products relative to the levels found in individuals expressing the wild type gene, it is possible to treat pulmonary diseases such as COPD by inhibiting expression of those genes or increasing the degradation of the gene products. Treatments to decrease gene expression, particularly by increasing the degradation of the gene products, include, but are not limited to, the expression of anti-sense mRNA, triplex formation, inhibition by co-expression, and administration or expression of siRNA. Thus, in one embodiment, antisense RNA introduced into a cell binds to complementary mRNA and inhibits the translation of that molecule. In another embodiment, antisense single stranded cDNA introduced into a cell inhibits the translation, and possibly speeds degradation of the DNA-RNA duplex. In another embodiment, short interfering RNAs (RNAi or siRNA) specifically inhibit gene expression. See Tuschl et al., Nature 411:494-498 (2001). In another embodiment, stable triple-helical structures can be formed by bonding of oligodeoxyribonucleotides (ODNs) to polypurine tracts of double stranded DNA. See, for example, Rininsland, Proc. Nat'l Acad. Sci. USA 94:5854-5859 (1997). Triplex formation can inhibit DNA replication by inhibition of transcription of elongation and is a very stable molecule.
4.3 Methods to Enhance the Activity of Specific Proteins
Where it is desirable to enhance the activity of proteins in a subject the proteins themselves may be administered to the subject. Alternatively, the subject may be treated, as described above, to introduce one or more copies of nucleic acids encoding the protein. Where the protein encodes an enzyme, it is even possible to supply the product of the transformation catalyzed by the enzyme.
4.4 Methods to Inhibit the Activity of Specific Proteins
In those instances where it is desirable to reduce the level or activity of one or more proteins produced by the genes in the chromosomal regions described herein to treat pulmonary diseases, the proteins can be reduced with an agent having affinity for the protein. Such agents include, but are not limited to, monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies) or a fragment thereof, including but not limited to an scFv, a Fab fragment, a Fab′ fragment, a F(ab′)2, an Fv, and a disulfide linked Fv.
In one embodiment, specific antibodies, or fragments thereof, may be used to bind the protein thereby blocking its activity. Such antibodies may be obtained through the use of conventional techniques, including hybridoma technology, or may be isolated from libraries commercially available (e.g., libraries from Dynax (Cambridge, Mass.), MorphoSys (Martinsried, Germany), Biosite (San Diego, Calif.) and Cambridge Antibody Technology (Cambridge, UK)). In addition, where the protein in question interacts with another protein, such as a cellular receptor, antibodies that antagonize the interaction between the specific protein and the cellular receptor can be used to block interactions that lead to the development of COPD and other pulmonary diseases.
5.1 Nucleic Acids
The present disclosure encompasses nucleic acid analogs that contain modified, synthetic, or non-naturally occurring nucleotides or structural elements or other alternative/modified nucleic acid chemistries known in the art. Such nucleic acid analogs are useful, for example, as detection reagents (e.g., primers/probes) for detecting one or more SNPs identified in Tables 5a, 5b, 7, 8 and/or in
Additional examples of nucleic acid modifications that improve the binding properties and/or stability of a nucleic acid include use of base analogs such as inosine, intercalators (U.S. Pat. No. 4,835,263) and minor groove binders (U.S. Pat. No. 5,801,115). Thus, references herein to nucleic acid molecules, SNP-containing nucleic acid molecules, SNP detection reagents (e.g., probes and primers), and oligonucleotides/polynucleotides include PNA oligomers and other nucleic acid analogs. Other examples of nucleic acid analogs and alternative/modified nucleic acid chemistries known in the art are described in Current Protocols in Nucleic Acid Chemistry, John Wiley & Sons, N.Y. (2002).
The term “target nucleic acid” can include any nucleic acid sequence to be detected in an assay. The “target nucleic acid” may comprise the entire sequence of interest (e.g., one or more of the nineteen chromosomal regions identified herein) or may be a sub-sequence (e.g., a fragment) of the nucleic acid target molecule, such as a nucleotide sequence wherein a variation such as a SNP may be present. In an embodiment, the portion of a target nucleic acid may be in a range selected from: 25 to 50 base pairs, 30 to 60 base pairs, 40 to 80 base pairs, 40 to 100 base pairs, 50 to 200 base pairs, 60 to 300 base pairs. 70 to 500 base pairs, 80 to 800 base pairs, 100 to 1,000 base pairs, 200 to 4,000 base pairs, 500 to 10,000 base pairs, and 1,000 to 20,000 base pairs of chromosomal regions 1-19 (see, e.g.,
5.1 Nucleotide Probes and Primers
The present disclosure includes and provides for nucleic acid molecules that may be used to detect variations in the nucleotide sequences of the nineteen regions identified herein, including both probes and primers.
Nucleic acid probes include any oligomer of RNA, DNA, or PNA, suitable for hybridizing to all or a portion of the target nucleic acid (DNA or RNA) that can be used to initiate the synthesis of a nucleic acid molecule that is complementary to the sequence of that target. Alternatively, nucleic acid probes include any oligomer of RNA, DNA, or PNA that can be used to detect variations in the sequence of the target nucleic acid. In some embodiments, nucleic acid probes can be, for example, a primer suitable for use in methods where a DNA polymerase extends the primer, such as in polymerase chain reaction (PCR) or variants thereof (e.g., hot start PCR). Such primers may be labeled with a detectable moiety or may be unlabeled. Likewise, a primer may be in solution or immobilized to a solid support or solid carrier. In some embodiments, a suitable primer can also be a suitable probe. In some embodiments, a suitable probe can be a suitable primer.
Nucleic acids of the present disclosure include and provide for nucleic acids in the form of a composition, such as a kit, comprising two or more nucleic acid probes for the identification of one or more variations in a nucleotide sequence of one or more chromosomal regions selected independently from regions 1-19. Such kits optionally comprise instructions for the use of the kit to identify one or more of said variations and/or one or more control nucleic acids for said variations in said nucleotide sequence. In one embodiment, the control is a nucleic acid. In another embodiment, the control is selected from the group consisting of homozygous reference genotype, homozygous variant genotype, heterozygous genotype, and combinations thereof for the SNPs identified by the probes. In another embodiment, one or more nucleic acids in a kit or composition bind to a region adjacent to a SNP or variation (e.g., within a distance that the nucleic acid can be used as a nucleic acid primer for detecting or amplifying the SNP or variation, or within 1, 10, 20, 30, 50, 100, 200, 300, 400 or 500 base pairs of the SNP or variation) present in chromosomal regions 1-19. In yet another embodiment of a kit or composition, at least one, two, three, four, five, or six different nucleotide is suitable for use as primers for the amplification of a nucleic acid sequences within one or more of chromosome regions 1-19 (e.g., the nucleic acids are different PCR or LCR primers). In such an embodiment, the nucleic acids comprise a nucleotide sequence that is complementary to at least one strand of the nucleotide sequence of said chromosomal regions.
The nucleic acid molecules of the kits can include a probe that is capable of detecting all or a portion of a given target nucleic acid sequence, such as a SNP sequence. The nucleic acid molecule can include a nucleic acid sequence that is longer than a given SNP sequence. In some embodiments, the kits include instructions for preparing the samples for analysis using the kit. In some embodiments, the kits include instructions for analyzing and/or interpreting the results obtained using the kit.
Nucleic acid probes may be any suitable nucleic acid (polynucleotide) molecule. Suitable nucleic acid probes include any oligomer, comprising two or more nucleobases containing subunits, such as a polynucleotide (RNA or DNA) or synthetic polynucleotide mimetics such as peptide nucleic acids (PNA). In some embodiments nucleic acid probes may contain greater than about 10, 12, 14, 15, 16, 17, 18, 20, 22, or 24 nucleobases containing subunits and less than about 26, 28, 30, 32, 34, 36, 40, 44, 48 or 50 nucleobases. In other embodiments, the probes may contain greater than about 18, 20, 22, 24, 26, or 28 nucleotides and less than about 100, 200 300, 400 or 500, 750 or 1,000 nucleobases containing subunits. Nucleic acid probes, whether comprising DNA, RNA or synthetic mimetics can hybridize to all or a portion of the target nucleic acid (DNA or RNA). Probes may be labeled with a detectable moiety (e.g., fluorescent tags or isotope labels) or may be unlabeled. Likewise, a probe may be in solution or immobilized to a solid support or solid carrier. In one embodiment, compositions comprising probes may comprise nucleic acid sequences from two, three, four, five, six, seven, eight or more different chromosomal regions of the nineteen chromosomal regions identified herein (see e.g.,
The present disclosure also provides compositions comprising two or more pairs of nucleic acid molecules that may be, for instance, pairs of primers for amplification of various portions of chromosomal regions 1-19. In such embodiments, the two or more pairs of nucleic acid molecules comprise a first pair of nucleic acid molecules and a second pair of nucleic acid molecules. The first pair of nucleic acid molecules comprises a first nucleic acid molecule comprising a nucleotide sequence complementary to a portion of a chromosomal region selected from chromosomal regions 1-19 and a second nucleic acid molecule comprising a nucleotide sequence complementary to the opposite strand of the chromosomal region to which said first nucleic acid is complementary. The second pair of nucleic acid molecules comprises a third nucleic acid molecule comprising a nucleotide sequence complementary to a portion of a chromosomal region selected from chromosomal regions 1-19 and a fourth nucleic acid molecule comprising a nucleotide sequence complementary to the opposite strand of the chromosomal region to which said third nucleic acid is complementary. Such compositions may contain additional pairs of nucleic acid molecules.
5.2 Pharmaceutical Compositions Comprising Nucleic Acids
The linkage of specific chromosomal regions, including specific genes, to pulmonary diseases provides a basis for new therapeutic compositions. Those compositions may be directed, for example, at the genes or their products, and may be used to inhibit, slow, or prevent lung diseases such as COPD. For instance, the pharmaceutical compositions may comprise one or more of a gene product of CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, ENPP6, KBTBD9, MSRB3, MYO5B, or TSC2. Such compositions may be useful to treat subjects suffering from pulmonary diseases such as COPD and may even be used prophylactically to treat individuals with a predisposition to the development of COPD (e.g., to prevent the development of COPD triggered by exposure to inhalation of noxious substances).
5.3. Antibodies and Composition Comprising Antibodies
The term antibody includes any naturally occurring (e.g., monospecific polyclonal) or man-made antibodies such as monoclonal antibodies produced by conventional hybridoma technology. The term antibody also includes fragments or portions of antibodies that contain the antigen-binding domain and/or one or more complementarity determining regions of these antibodies, including but not limited to a scFv, a Fab fragment, a Fab′ fragment, a F(ab′)2, an Fv, or a disulfide linked Fv. The term antibody refers to any form of antibody, or fragment thereof, that specifically binds to an antigen such as an antigen of the gene product of any one of KBTBD9, MSRB3, TSC2, CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, MYO5B, and ENPP6, and specifically covers monoclonal antibodies (including full length monoclonal antibodies), polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), Fab(s), Fab′(s), single chain antibodies, diabodies, domain antibodies, miniantibodies, or an antigen binding fragment of any of the foregoing. Any specific antibody or fragment thereof can be used in the methods and compositions provided herein including but not limited to an scFv, a Fab fragment, a Fab′ fragment, a F(ab′)2, an Fv, a disulfide linked Fv, an Fab(s), an Fab′(s), a single chain antibodies, diabodies, domain antibodies, miniantibodies, or antigen binding fragments of any of the foregoing. Thus, in one embodiment the term “antibody” encompasses a molecule comprising at least one variable region from a light chain immunoglobulin molecule and at least one variable region from a heavy chain molecule that in combination form a specific binding site for the target antigen. In some embodiments, antibodies may also be an IgA, IgD, IgE, IgG or IgM or any combination thereof, including combinations of subtypes of those antibodies. In one embodiment, the antibody is an IgG antibody; for example, the antibody can be an IgG1, IgG2, IgG3, or IgG4 antibody.
The antibodies useful in the present methods and compositions can be generated in cell culture, in phage, or in various animals, including but not limited to cows, rabbits, goats, mice, rats, hamsters, guinea pigs, sheep, dogs, cats, monkeys, chimpanzees, or apes. See generally, Harlow, E. & Lane, E. (1988) Antibodies: A Laboratory Manual (Cold Spring Harbor Press, Cold Spring Harbor, N.Y.). In one embodiment, an antibody is a mammalian antibody. In another embodiment, phage display techniques can be used to screen for and isolate an initial antibody or to generate variants with altered specificity or avidity characteristics. Such techniques are routine and well known in the art. See e.g., U.S. Pat. No. 6,172,197.
In other embodiments, antibodies are produced by recombinant means known in the art. For example, a recombinant antibody can be produced by transfecting a host cell with a vector comprising a DNA sequence encoding the antibody. One or more vectors can be used to transfect the DNA sequence expressing at least one VL and one VH region in the host cell. Exemplary descriptions of recombinant means of antibody generation and production include Delves, Antibody Production: Essential Techniques (Wiley, 1997); Shephard, et al., MONOCLONAL ANTIBODIES (Oxford University Press, 2000); Goding, Monoclonal Antibodies: Principles And Practice (Academic Press, 1993); Current Protocols In Immunology (John Wiley & Sons, most recent edition). A suitable antibody can also be modified by recombinant means to increase greater efficacy of the antibody in mediating the desired function. Antibody fragments or portions thereof include at least a portion of the variable region of the immunoglobulin molecule that binds to its target, i.e., the antigen binding region. An antibody can be in the form of an antigen binding antibody fragment including a Fab fragment, F(ab′)2 fragment, a single chain variable region, and the like. Fragments of intact molecules can be generated using methods well known in the art including enzymatic digestion and recombinant means.
The antibodies or antigen binding fragments thereof provided herein may be conjugated to a “bioactive agent.” As used herein, the term “bioactive agent” refers to any synthetic or naturally occurring compound that binds the antigen and/or enhances or mediates a desired biological effect to enhance cell-killing toxins, or can be an agent used to detect the antibody in vitro or in vivo. Bioactive agents include, but are not limited to, enzymes (e.g., ricin or portions and modified forms thereof), radiolabels, and sensitizers such as agents useful for photodynamic therapy such as aminolevulinic acid (ALA), phthalocyanines, (e.g., silicon phthalocyanine Pc 4), and m-tetrahydroxyphenylchlorin.
The compositions, methods, kits and the like, thus generally described, will be further understood by reference to the following examples, which are provided by way of illustration and are not intended to be limiting.
To identify genetic risk factors for COPD, a GWAS was performed in a sample of 192 adult smokers with COPD by spirometry and in 197 control subjects (90 smokers and 107 never smokers). Outcomes analyzed were 4 spirometry-based indices that deconvolute the major pathophysiologic factors associated with COPD, including baseline lung function (BL), age-related decline (Age decline), pack-years-related decline (Pack-years decline), and the intensifying effects of smoking, in terms of number of cigarettes per day (CPD), on decline with age decline (Pack-years decline). The minimum p-values were 8.5×10−6 (BL), 2.33×10−7 (Age decline), 1.90×10−6 (Pack-years decline), 1.90×10−6 (CPD×Age decline). False discovery rate (FDR) analysis showed that Age decline and Pack-years decline were enriched for significant associations. A minimum SNP-specific FDR (q-value) of 0.124 was found within the gene ENPP6 for Age decline. A total of 33 SNPs had q-values less than 0.5, with most being associated with Pack-years decline. As shown in
6.1 Methods
6.1.1 Study Sample
Cases were obtained from a subset of the Lung Health Study (LHS), a prospective, randomized, multicenter, clinical trial in the US and Canada conducted in two phases between 1986 and 2001 (LHS-1 and LHS-3) (Buist et al. 1993, Chest 103 (6):1863-1872; Anthonisen et al. 1994, JAMA 272:1497-1505; Anthonisen et al. 2002, Am. J. Respir. Crit. Care Med. 166:675-679). Participants in LHS-1 were otherwise healthy cigarette smokers, aged 35 to 60 years, with mild or moderate COPD as determined by spirometry (ratio of forced expiratory volume in 1 second (FEV1) to forced vital capacity (FVC)<0.70 and FEV1 55% to 90% of predicted) (National Institutes of Health and National Heart Lung and Blood Institute 2007). At the University of Utah center, 624 participants enrolled in LHS-1, and 503 completed LHS-3. Of these, 192 had genotyping performed in a follow-on, cross-sectional, genetic association study, the Genetics of Addiction Project (GAP), during 2003-2005. GAP also included 197 gender- and age-matched controls (90 smoked cigarettes and 107 never smoked).
6.1.2 Lung Function Decline Outcome Measures
Four quantitative spirometry-based indices of lung function decline in the study sample, best linear unbiased predictors (BLUPS), were derived from longitudinal mixed growth curve modeling as a function of major COPD risk factors and is described herein. (The general statistical approach is described in Robinson 1991; Goldstein H. Multilevel statistical models. New York: Wiley, 1995.) Mixed models specifically designed for the analysis of clustered data and that estimate two types of parameters, fixed and random effects were used (Demidenko 2004, Mixed models: theory and applications. Wiley: Hoboken, N.J.). Fixed effects are analogous to regression coefficients, while random effects describe the degree to which an individual subject's coefficient value deviates from the fixed effect.
6.1.3 Data Analysis and Modeling
Data were modeled for 624 cigarette smokers with COPD and aged 35-60 at baseline, followed up 7 times over approximately 17 years (1986-2004) in the Lung Health Studies (Anthonisen et al., 1994; Connett et al., 1993, Control. Clin. Trials 14:3S-19S) and its follow-on Genetics of Addiction Project (GAP); 204 GAP subjects without COPD were also studied as controls (see Table 1 for descriptive statistics). The optimal model of the data was selected based on likelihood ratio tests, which were used to determine the significance of each fixed and random effect parameter as it was added to the model (Willet et al., 1998). After the optimal model was identified, the outcome variables were calculated as best linear unbiased predictors (BLUPs) of the random effects. Missing data were handled by multiple imputation using chained equations, with 5 datasets imputed and analyzed (Van Buuren et al. 2006, Journal of Statistical Computation and Simulation 2006; 76(12): 1049-1064; Royston 2005, Stata Journal 5(4): 527-536).
In developing the random effect-based outcome measures, linear mixed models predicting forced expiratory volume in 1 second (FEV1) were systematically developed. Linear mixed models are a generalization of linear regression allowing for the inclusion of random deviations (i.e. random effects) other than those associated with the overall residual term. In matrix notation,
y=Xβ+Zu+ε
where y is the n×1 vector of responses, X is a n×p design/covariate matrix for the fixed effect P, and Z is the n×q design/covariate matrix for the random effects u. The n×1 vector of residuals c, is assumed to be multivariate normal with mean zero and variance matrix σe2In.
The fixed portion, Xβ, is equivalent to the linear predictor of OLS regression. For the random portion, Zu+ε, it is assumed that the u has variance-covariance matrix G and that u is orthogonal to ε so that
The random effects u are not directly estimated (although, as described below, they may be predicted), but instead are characterized by the elements of G, known as the variance components, that are estimated along with the residual variance σe2. Considering Zu+c the combined error, we see that y is multivariate normal with mean Xβ and n×n variance-covariance matrix
V=ZGZ′+σ
e
2
I
n
The model building process is shown in Table 2. The outcome measures used in this analysis were derived from the random effects of the final, best-fitting model:
y
ij=β0+β1x1ij+β2x2ij+β3x3ij+β4x4ij+β5x5ij+β6x6ij+β7x7ij+u0i+u1i+u2i+u3i+eij
where i indexes subjects, j indexes repeated assessments, y is FEV1, β0 is the intercept fixed effect, x1 is age, β1 is the age fixed effect, x2 is pack years, β2 is the pack years fixed effect, x3 is CPD×age, β3 is the cpd×age fixed effect, x4 is height, β4 is the height fixed effect, x5 is gender, β5 is the gender fixed effect, x6 is gender×age, β6 is the gender×age fixed effect, x7 is never-smoked status, β7 is the never-smoked status fixed effect, u0i is the intercept random effect, u1i is the age random effect, u2i is the pack years random effect, u3i is the CPD×age random effect and eij is the within-subject residual. Parameter estimates and p-values for the final model (shown in Table 2 as Model 15) are shown in Table 3.
†Two values are given for the degrees of freedom as the test statistic has an F-distribution.
The covariance structure of the four random effects was modeled as unstructured:
Thus, the random parameters are multivariate normal distributed with means of zero and variance-covariance matrix G. The variances of the parameters are on the diagonal and the covariances in the off-diagonal cells of G. The residual is assumed to be normally distributed with a mean of zero and variance of σ2e.
Because random effects are not directly estimated by the mixed model, they must be predicted in an additional post-estimation step. BLUPs of the random effects u were obtained as
ũ={tilde over (G)}Z′{tilde over (V)}
−1(y−X{circumflex over (β)})
where {tilde over (G)} and {tilde over (V)} are G and V with estimates of the variance components plugged in. The EM algorithm was used for maximum likelihood estimation as described by Pinheiro and Bates (Mixed-Effects Models in S and S-PLUS. Berlin: Springer, 2000).
The best-fitting model showed significant random effects for baseline lung function, age, pack-years (product of the average number of packs smoked daily and the total years of smoking), and the interaction between age and recent smoking as estimated by the number of cigarettes smoked daily. The effect size for each of these factors varied considerably across subjects. BLUPs for baseline lung function (BL), age-related decline (Age decline), Pack-years-related decline (Pack-years decline), and the interaction between age and smoke-related decline (CPD×Age decline) were calculated for these four significant random effects and served as the outcome measures in the GWAS. The mean correlation among the BLUPs was −0.22, suggesting that they reflected independent biological effects. These more homogenous, independent measures are useful compared to composite measures that can confound distinct mechanisms and can result in a loss of statistical power.
6.1.4 Sample Collection and Preparation and Genotyping
A whole blood sample was collected by venipuncture from each subject in an EDTA vacutainer tube. DNA was extracted from white blood cells, purified (Puregene Kit, Gentra Systems, Inc, Minneapolis, Minn.), and stored at −70° C. Genotyping was performed in accordance with manufacturer-recommended procedures using the Infinium II HumanHap 550 SNP array (Illumina, San Diego, Calif.) on a BeadStation. Robotic liquid handling stations were used for sample handling. The HumanHap 550 array assays 555,352 tagging SNPs selected from Phases I and II of the HapMap Project. Genotypes were called using BeadStudio genotyping module version 3.2.32. The mean call rate of arrays in the analysis was 0.998, and arrays with a fail rate above 0.980 were repeated.
6.1.5 Association Analysis
All association analyses were performed in PLINK. The minimum allowable SNP and individual genotyping success rates were 0.95. The minimum allowable observed SNP minor allele frequency (MAF) was 0.025.
To control the risk of false discovery, for each significant BLUP-based SNP association a q-value was calculated. A q-value is an estimate of the proportion of false discoveries, or FDR, among all significant markers when the corresponding p-value is used as the threshold for declaring significance (Storey 2003, Ann. Stat. (31):2013-2035; Storey and Tibshirani 2003, Proc. Natl. Acad. Sci. U.S.A. 100 (16):9440-9445). This FDR-based approach (1) provides a good balance between the competing goals of true positive findings versus false discoveries, (2) allows the use of more similar standards in terms of the proportion of false discoveries produced across studies because it is much less dependent on the arbitrary number, or sets, or statistical tests that are performed, (3) is relatively robust against the effects of correlated tests, and (4) provides a more subtle picture about the possible relevance of the tested markers rather than an all-or-nothing conclusion about whether a study produces significant results (Benjamini and Hochberg 1995, Journal of the Royal Statistical Society B 57:289-300; Brown and Russell 1997, Statistics in Med. 16 (22):2511-2528; Storey 2003, Ann. Stat. (31):2013-2035; Sabatti, Service, and Freimer 2003, Genetics 164 (2):829-833; Tsai, Hsueh, and Chen 2003, Biometrics. 59 (4):1071-1081; van den Oord and Sullivan 2003, Human Heredity 56 (4):188-189; Fernando et al. 2004, Genetics 166 (1):611-619; Korn et al. 2004, Journal of Statistical Planning and Inference 124 (2):379-398; van den Oord 2005, Mol. Psychiatry. 10 (3):230-231). The q-values were calculated conservatively assuming p0=1. For each BLUP-based association an estimate of the proportion of null effects (p0) was calculated using two estimators known to perform best in GWAS studies (Meinshausen and Rice 2006, The Annals of Statistics 34 (1):373-393; Kuo et al. 2007, BMC Proceedings, 1: S143).
For comparison with the BLUP-based association results, a secondary analysis was performed using as outcomes the statistically less powerful traditional case-control categories and the FEV1/FVC ratio by which COPD is operationally defined.
6.1.6 Stratification
All subjects were Caucasian, but there could be genetic subgroups in the sample. Population substructure could result in false positive findings if the subgroups differed in allele frequencies, prevalence of COPD, or quantitative measures of lung function decline. A variety of methods is available to detect population substructure and correct for its potential confounding effects. Sullivan et al. (Sullivan et al. 2008, Mol. Psychiatry. 13 (6):570-584) performed an extensive evaluation of multiple statistical methods to avoid false positive findings in GWAS due to such genetic subgroups. They concluded that the principal components and multi-dimensional scaling (MDS) approaches were very similar and superior to other approaches. MDS was used for practical reasons as it can be implemented in PLINK (Purcell et al. 2007, Am. J. Hum. Genet. 81 (3):559-575).
Input data for the MDS approach were the genome-wide average proportion of alleles shared identically by state (IBS) between any two individuals. Somewhat analogous to principal component analysis, the first MDS dimension of a (genetic) similarity matrix captures the maximal variance in the genetic similarity, the second dimension must be orthogonal to the first and captures the maximum amount of residual genetic similarity, and so on. A one-dimension solution was the best-fitting model to account for the genetic similarity among subjects in this sample.
6.2.1 GWAS Results
A total of 391 assays, each with 561,466 SNPs, was performed and passed quality control. After filtering by fail rate and minimum minor allele frequency, 518,714 SNPs were analyzed for association with the four lung function decline BLUPs. FDR analysis performed on tests of Hardy-Weinberg equilibrium using the entire sample showed a FDR of 10%, corresponding to a p-value <0.0001. An additional 3,823 SNPs had deviations from Hardy-Weinberg equilibrium below a FDR of 10%.
The minimum P values for the BLUP-based SNP associations were 8.5×10−6 (BL), 2.33×10−7 (Age decline), 1.90×10−6 (Pack-years decline), and 1.90×10−6 (CPD×Age decline). After FDR analysis, Pack-years decline and Age decline showed evidence of true effects with a minimum p0 estimate of 0.9999877. As the product of (1-p0) and the number of markers estimates the number of effects, this suggested 0 to 8 SNPs with real effects (Table 4). In contrast, the BL and CPD×Age decline SNP associations had p0 estimates of 1 or greater, suggesting moderate inflation of false discoveries since completely null data would show a p0 equal to 1.
After the FDR analysis, 33 SNPs had q-values less than 0.5 (see, e.g., Tables 5a and 5b and
6.2.2 Genes within the Chromosomal Regions
Linkage disequilibrium refers to the co-inheritance of alleles (e.g. alternative nucleotides) at two or more different SNPs at frequencies greater than would be expected from the separate frequencies of occurrence of each allele in a given population. The expected frequency of co-occurrence of two alleles that are inherited independently is the frequency of the first allele multiplied by the frequency of the second allele. Alleles that co-occur at expected frequencies are referred to as being in “linkage equilibrium”. In contrast, LD refers to any non-random genetic association between allele(s) at two or more different SNP sites. Thus, if a particular SNP site is useful for diagnosing pulmonary disease (e.g. has a significant statistical association with the condition and/or is recognized as a causative polymorphism for the condition), then a skilled artisan will recognize that other SNP sites, which are in LD with this SNP site, would also be useful for diagnosing the condition. For example, SNPs that are not causative polymorphisms, but are in LD with one or more causative SNPs are also useful for diagnosing the pulmonary disease. Thus, SNPs that are in LD with causative polymorphisms are also useful as diagnostic markers of pulmonary diseases. Useful LD SNPs can be selected from among the SNPs disclosed in Tables 5a, 5b, 7, 8, and
Table 5a shows the top SNPs for GWAS with q-values <0.5, and Table 5b shows the assignment of those SNPs to 19 different chromosomal regions defined by an LD where r2>0.2 between the SNPs in Table 5a and flanking SNPs. For the purpose of this disclosure, “Smoke Exposure” is also called “CPD×Age.”
The LD patterns in the regions for selected SNPs that clustered in genes were examined. For CSMD1 (CUB and Sushi multiple domains 1) on chromosome 8p, three SNPs in a 7.4 kilobase (kb) region had p-values less than 1.9×10−5 and individual q-values between 0.32 and 0.38. Further examination of the association identified three additional associated markers in a 103 kb region that had a minimum q-value of 0.75 within 50 kb of the core and contained 80 markers in all. A total of 9, 22, and 29 significant SNPs were found in this region (p-value=0.0001, 0.001, and 0.01, respectively). Linkage disequilibrium and association results for a portion of the region are shown in
Recently CSMD1 has been shown to inactivate the classic complement pathway (Kraus et al. 2006, J. Immunol. 176 (7):4419-4430). Recently, COPD has been shown to be in part an autoimmune disease with anti-elastin autoantibodies being detected in COPD patients (Lee et al. 2007, Nat. Med. 13 (5):567-569). Smoking-induced recurrent infections or autoimmunity may lead to a persistent activation of the complement system. Genetic variability in the regulation of the complement system as suggested by the association with CSMD1 provided herein could explain in part the different risk of COPD development or progression given a certain exposure level.
Four SNPs in MYO5B had p-values less than 7.58×10−6. MYO5B, which encodes the Myosin VB protein, a large gene extending over 372 kb with a total of 123 SNPs tested. A large section (˜210 kb) of the gene did not show any significantly associated markers. Three additional associated markers were found in a 164 kb region that had a minimum q-value of 0.75 and was within 50 kb of the core. A total of 6, 9, and 19 of the 55 SNPs in this region were significant (p-values less than 0.0001, 0.001, and 0.01, respectively). Three SNPs in MYO5B were also significantly associated with COPD using the less powerful case-control categories (p-values <1×10−4). When the core of the MYO5B association was restricted to a 7.4 kb region, the four most significantly associated SNPs in MYO5B covered 57.4 kb. The extended 164 kb region was primarily within the MYO5B gene but extends into the gene ACAA2. Examination of LD across the 164 kb region revealed at least two different distinct signals not in high LD (D′˜0.42) with each other.
DNAH3 is a large gene extending over 226 kb. A total of 33 SNPs were tested in DNAH3, and two SNPs had p-values ≦1.7×10−5. One additional SNP, rs2301620, had a q-value less than 0.75 (p-value 8.96×10−5). These three SNPs covered 15.2 kb, and examination of LD showed they were in high LD with marker-to-marker D′ greater than 0.99 and minimum D′ of 0.82.
DNAH3 encodes the dynein axonemal heavy chain 3, which is used in the assembly of cilia. Axonemal dyneins are microtubule-associated motor protein complexes necessary for cilia and flagella function. Cilia are critically important in the clearance of material including mucus and particulate matter from the lung. DNAH3 is also known as DLP3, DNAHC3B, Hsadhc3, FLJ31947, FLJ43919, FLJ43964, and DKFZp434N074.
The most significant GWAS association was with rs7689305 in the gene ENPP6 for the Age Decline BLUP (p-value=2.33×10−7, q-value=0.12). An additional three SNPs in ENPP6 had p-values less than 0.000005 (q-value ˜0.53). The four associated SNPs were in a single 30 kb region of high LD (minimum D′=0.94, r=0.32) Fig. These SNPs also showed association with the FEV1/FVC ratio (p-value 0.000076, q-value 0.95) but not case-control status.
ENPP6 encodes an ectonucleotide pyrophosphatase/phosphodiesterase and is in the ether lipid pathway. The enzyme has Phospholipase C (PLC) activity and can act on lysoplasmalogen and platelet activating factor (PAF) (Sakagami et al. 2005, J. Biol. Chem. 280 (24):23084-23093). PAF is a powerful mediator of hypersensitivity and inflammation and a direct activator of neutrophils that are thought to be an important in COPD. While not wishing to be bound by theory, if genetic variation led to an increased or decreased abundance or activity of ENPP6, the amount or duration of PAF would be altered thereby potentially influencing neutrophil behavior and activity. A related gene ENPP2 has shown evidence for involvement in mouse lung function (Ganguly et al. 2007, Physiol Genomics. 31 (3):410-421) and expression levels are predictive of lung cancer survival (Lu et al. 2006, PLoS. Med. 3 (12):e467). ENP6 is also known as NPP6 and MGC33971.
A cluster of significant SNPs near MSRB3, which encodes methionine sulfoxide reductase B3, was observed. Evidence for association with MSRA (p-value 0.0000069, q-value of 0.61) was also observed. Methionine sulfoxide reductase is an enzyme that reverses oxidative protein damage by reducing methionine sulfoxide back to methionine. It may play an important role in protection from oxidative stress.
6.2.3 Other Genes
Associations at an FDR of 0.5 for a single SNP were observed in genes CLEC4A, EBF2, and ELMO1 for the Pack-years decline BLUP, in KBTBD9 for case versus control status, and in TSC2 for the ratio FEV1/FVC.
CLEC4A encodes a member of the C-type lectin/C-type lectin-like domain (CTL/CTLD) superfamily. Members of this family share a common protein fold and have diverse functions, such as cell adhesion, cell-cell signaling, glycoprotein turnover, and roles in inflammation and immune response. The encoded type 2 transmembrane protein may play a role in inflammatory and immune response. Multiple transcript variants encoding distinct isoforms have been identified for this gene. This gene is closely linked to other CTL/CTLD superfamily members on chromosome 12p13 in the natural killer gene complex region. CLEC4A is also known as DCIR, LLIR, DDB27, CLECSF6, and HDCGC13P.
EBF2 belongs to the conserved Olf/EBF family (see MIM 164343) of helix-loop-helix transcription factors. EBF2 is also known as COE2, OE-3, EBF-2, O/E-3, and FLJ11500.
ELMO1 encodes a protein that interacts with the dedicator of cyto-kinesis 1 protein to promote phagocytosis and effect cell shape changes. Similarity to a C. elegans protein suggests that this protein may function in apoptosis and in cell migration. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. ELMO1 is also known as CED12, CED-12, ELMO-1, KIAA0281, and MGC126406.
More than half of the significant SNPs were found in intergenic regions, often in clusters. Two clusters were observed on chromosome 9, including three SNPs covering 15.6 kb at megabase 27.6 and two SNPs covering 1.6 kb at megabase 77.5 Mb. Another group of four associated SNPs covering 48 kb was found on chromosome 12 around 64.2 Mb. This cluster was 103 kb from the gene MSRB3 that encodes methionine sulfoxide reductase B3. Three SNPs within 10 kb were observed near 102.4 Mb on chromosome 13. However, these represent SNPs in perfect LD and may not be a cluster as their allele frequencies and p-values were identical. Additional significant singleton SNPs are listed in
Unless otherwise indicated, the nucleic acids listed or set forth in Table 6 by NCBI accession or GI number include: nucleic acids having the sequences recited under the Accession and/or GI number, the complement of those sequences; and either or both strands (if double stranded). Where the identifiers recite a genomic sequence, the mRNA (or cDNAs thereof) are also available in the databases of the NCBI and are considered part of this disclosure.
6.3 Summary
In summary, four different BLUPs measuring individual differences in processes involved in COPD were analyzed and SNPs having an association with four lung function decline BLUPs are provided herein. Thirty-three SNPs significant at a FDR of less than 50% are provided herein. The minimum q-value of 0.12 was found in ENPP6. Clusters of SNPs meeting the FDR cut off were found in genes CSMD1, MYO5B, and DNAH3. Additionally, SNPs below the critical FDR were found in the genes CLEC4A, EBF2, ELMO1, and TSC2.
Multiple SNPs in MYO5B were associated with the Pack-years decline BLUP and importantly the categorical analysis based on case-control status. This allows other groups with samples but without longitudinal data sets, and therefore not able to generate comparable BLUPs, to directly replicate the findings in this study. Two distinct signals were also discovered in MYO5B that were only in modest LD with each other and therefore represent separate results. Multiple SNPs indicate results are not technical errors. The combination of MYO5B having multiple independent association signals, makes a useful marker for the methods and kits provided herein.
The sample size for the investigation described herein was modest for a GWAS of a complex trait. However, the investigation described herein has the advantage of having long-term repeated measures. These measures enabled the modeling of decline in lung function and the separation of the effects of age, baseline lung function, and cigarette smoking. The resulting phenotypic analyses produced more homogenous quantitative outcomes. Quantitative measures are inherently more powerful and decreasing heterogeneity further increases power. One approach is to analyze cigarette smoking-related BLUP-based SNPs for associations contingent on or as an interaction with a measure of smoking such as pack-years.
7.1 Materials and Methods
7.1.1 Study Design and Subjects
The COPD Biomarker Discovery Study (CBD) was a cross-sectional study at the University of Utah to identify novel diagnostic, prognostic or therapeutic biomarkers of COPD in adult current or former cigarette smokers. Male and female self-reported cigarette smokers, aged 45 years or older, with at least 10 pack-years smoking history were recruited from the University Health Sciences Network of local clinics and hospitals and from community physician offices. COPD was diagnosed in 300 subjects according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) spirometric guidelines as having a ratio of forced expiratory volume in 1 second (s) (FEV1) to forced vital capacity (FVC)<0.70 (Rabe et al. 2007). The control group included 425 sex- and age-matched (using 10-year bands), current or former cigarette smokers, without apparent lung disease who had FEV1/FVC≧0.70, and were recruited from the same clinical settings. Individuals who had recent exacerbation of COPD, uncontrolled angina, hypertension, or allergy to albuterol, and females who were pregnant or lactating were excluded. Demographic variables, respiratory symptoms and medical history, tobacco use history, and concomitant medications were assessed. Pack-years were calculated as (maximum average number of cigarettes smoked daily over total smoking history/20)×(total years smoking). Body weight and height were measured. Spirometry was performed with a rolling seal spirometer by certified pulmonary function technicians according to Amer. Thoracic Society guidelines (Miller et al. 2005, Euro. Resp. J. 26:319-338). Measurements of FEV1 and FVC were made before and at least 20 min after inhaled bronchodilator administration (albuterol 180 μg). The FEV1/FVC ratio was calculated for each subject from the highest post-bronchodilator values of FEV1 and FVC. A blood sample was collected for assessment of carboxyhemoglobin (COHb) and complete blood cell counts.
7.1.2 Blood Sample Collection and Processing
Whole blood samples were obtained from each subject by venipuncture using 10 mL EDTA Vacutainer® tubes (BD, Franklin Lakes, N.J., USA). White blood cells were separated from the whole blood samples and used as a source of DNA.
DNA was extracted from white blood cells, purified (Puregene Kit, Gentra Systems, Inc, Minneapolis, Minn.), and stored at −70° C. In 601 case and control samples genotyping was performed in accordance with manufacturer-recommended procedures using the Infinium II HumanHap 1M SNP array (Illumina, San Diego, Calif.) on a BeadStation. Robotic liquid handling stations were used for sample handling. The HumanHap 1M array assays N tagging SNPs selected from Phases I and II of the HapMap Project. Genotypes were called using BeadStudio genotyping module version 3.2.32. The mean call rate of arrays in the analysis was 0.998, and arrays with a fail rate above 0.980 were repeated.
7.2. Association Analysis
All replication association analyses were performed in PLINK. The minimum allowable SNP and individuals genotyping success rates were 0.9. The minimum allowable observed SNP minor allele frequency (MAF) was 0.05. Additional quality control steps included screening of SNPs with a Hardy-Weinberg Equilibrium test p-value <1×10−6.
7.2.1 Stratification
Subjects were predominantly Caucasian, but there were a small number of subjects from other ethnic groups. Population substructure could result in false positive findings if the subgroups differed in allele frequencies, prevalence of COPD, or quantitative measures of lung function decline. A variety of methods is available to detect population substructure and correct for its potential confounding effects. Sullivan et al. (Sullivan et al. 2008, Mol. Psychiatry. 13 (6):570-584) performed an extensive evaluation of multiple statistical methods to avoid false positive findings in GWAS due to such genetic subgroups. They concluded that the principal components and multi-dimensional scaling (MDS) approaches were very similar and superior to other approaches. MDS was used for practical reasons as it can be implemented in PLINK (Purcell et al. 2007).
Input data for the MDS approach were the genome-wide average proportion of alleles shared identically by state (IBS) between any two individuals. Somewhat analogous to principal component analysis, the first MDS dimension of a (genetic) similarity matrix captures the maximal variance in the genetic similarity, the second dimension must be orthogonal to the first and captures the maximum amount of residual genetic similarity, and so on. A one-dimension solution was the best-fitting model to account for the genetic similarity among subjects in this sample.
7.3 Results
7.3.1 GWAS Replication
A total of 601 assays (225 Cases, 367 Controls, 9 missing) from the PLINK output, each with 1,072,821 SNPs, was performed and passed quality control. A total of 6 subjects were eliminated as ancestry outliers. After filtering by fail rate, minimum minor allele frequency and HWE, 751,305 SNPs were analyzed for association with four phenotypes (COPD, Percent Predicted FVC, Percent Predicted FEV1, and the ratio (FEV1/FVC). In each analysis, smoking (pack years) and the first and second MDS ancestry dimensions were treated as covariates in a linear model for the quantitative traits and in a logistic model for the qualitative disease status (COPD). In addition, age and sex were included as covariates in the logistic model. Results focused on the results within the 19 associated regions previously described that contain genes that have already been identified in Example 1, including CLEC4A, CSMD1, DNAH3, EBF2, ELMO1, ENPP6, KBTBD9, MSRB3, MYO5B, ENPP6 and TSC2. See, e.g., Tables 5b and 6 and in
Analysis of the data in this example confirms the association of a number of genomic regions with pulmonary diseases such as COPD. This analysis, however, which employed a population that was on average older, had poorer lung function, was thinner, and smoked more, indicated that the more common alleles found in the SNPS identified in region 19 correlate with case rather than control status, which is the opposite of the finding in Example 1. That alleles associated with the same disease/phenotype may appear to flip without changes in the linkage disequlibrium has been describe in the art. See, e.g., Clarke et al., Genetic Epidemiology 34:266-274 (2010); Lin et al., The Amer. J. of Human Genetics 80: 531-538 (2007); and Zaykin et al. The Amer. J. of Human Genetics 82: 794-800 (2008). Multiple regression analysis employing analysis data and covariates from both Examples 1 and 2 is consistent with that finding, that region 19 contains genetic variations that are significantly associated with a predisposition for COPD and risk factors and spirometric indicators for developing COPD (e.g., pack years FEV1/FVC). Hence, individuals with genetic variations in that region may benefit from monitoring, prophylactic treatment and/or treatment. Analysis of genetic variations in region 19, particularly in conjunction with other genetic variations, described herein, also leads to an ability to diagnose a pulmonary disease, to predict the development of a pulmonary disease, to determine the probability of its development, and/or to predict its ultimate severity.
799 SNPs across the 19 genomic regions for the 4 phenotypes (total 3196 tests) were tested. Among those tests, 301 tests yielded FDR values <0.5. In Table 7, below, the top 20 results across phenotypes are presented. In the text below, the proportion of SNPs in each region yielding uncorrected p-values <0.05 is presented.
COPD is defined as FEV1/FVC less than 0.70
Region 1—Chromosome 1: 64994430 Base Pairs (bp)-65287192 Base Pairs (bp)
Region 1 (see e.g., NCBI Contig Accession Numbers: NW_001838579.2/GI:157811766; NW_921351.1/GI:88950243 and NT_032977.9) contains 74 SNPs in Phase1B. Of those, 14 were significant (nominal p-values <0.05) for association with FVC, 12 were significant (nominal p-values <0.05) for association with FEV1 and 1 for FEV1/FVC ratio.
Region 2—Chromosome 2: 23623939 bp-23696195 bp
Region 2 (see e.g., NCBI Contig Accession Numbers: NT_022184.15/GI:224515010 and NW_001838768.1) contains 26 SNPs in Phase 1B. One SNP was significant (nominal p-value <0.05) for an association with FVC and one SNP was significant at a nominal p-value of 0.05 for FEV1/FVC ratio.
Region 3—Chromosome 2: 168223608 bp-168271898 bp
Region 3 (see e.g., NCBI Contig Accession Numbers: NW_001838860.1/GI:157696421, NT_005403.17 and NW_921585.1) yielded no significant results in 20 Phase1B SNPs at a p-value of 0.05 across phenotypes.
Region 4—Chromosome 4: 185253393 bp-185315070 bp
Region 4 (see e.g., NCBI Contig Accession Numbers: NT_016354.19/GI:224514665, NW_001838921.1/GI:157696482 and NW_922217.1/GI:88981534) yielded 1 significant result (nominal p-value <0.05) for FEV1 among 25 Phase1B SNPs.
Region 5—Chromosome 6: 158785645 bp-158895704 bp
Region 5 (see e.g., NCBI Contig Accession Numbers: NT_025741.15/GI:224514841, NW_001838991.2 and NW_923184.1) contains 41 SNPs, 13 were significant (nominal p-values <0.05) for COPD, 9 for FVC, 11 for FEV1, and 2 were significant (nominal p-values <0.05) for FEV1/FVC ratio.
Region 6—Chromosome 7: 37326813 bp-37329120 bp
Region 6 (see e.g., NCBI Contig Accession Numbers: NT_007819.17/GI:224514859, NW_001839003.1/GI:157696564, NW_923240.1/GI:89025910 and NT_079592.2/GI:89026958) contains 4 SNPs none of which were significant at p<0.05.
Region 7—Chromosome 8: 3937389 bp-4048612 bp
Region 7 (see e.g., NCBI Contig Accession Numbers: NW_001839109.2/GI:157812071 and NW_923840.1/GI:89028496) contains 109 SNPs, 7 of which were significant (nominal p-values <0.05) for COPD, 12 of which were significant (nominal p-values <0.05) for FVC and 1 of which was significant for FEV1 (nominal p-values <0.05).
Region 8—Chromosome 8: 25960681 bp-25976212 bp
Region 8 (see e.g., NCBI Contig Accession Numbers: NT_167187.1/GI:224514765, NT_167187.1/GI:224514765 and NT_167187.1/GI:224514765) comprises 7 SNPs none of which were significant across the association tests.
Region 9—Chromosome 9: 13606003 bp-13726965 bp
Region 9 (see e.g., NCBI Contig Accession Numbers: NW_001839149.2 GI:157812089, NT_008413.18 GI:224514694 and NW_924062.1 GI:89030318) comprises 39 SNPs, 1 of which was significant (nominal p-values <0.05) for COPD and 1 of which was significant (nominal p-values <0.05) for FEV1/FVC ratio.
Region 10—Chromosome 9: 27600116 bp-27621390 bp
Region 10 (see e.g., NCBI Contig Accession Numbers: NT_008413.18/GI:224514694, NW_001839149.2/GI:157812089 and NW_924062.1/GI:89030318) contains 17 SNPs none of which were significant at a nominal p-value of 0.05.
Region 11—Chromosome 9: 77492323 bp-77640744 bp
Region 11 (see e.g., NCBI Contig Accession Numbers: NT_008470.19/GI:224514751, NW_001839221.1/GI:157696782 and NW_924484.1/GI:89030471) contains 61 Phase1B SNPs, 3 of which were significant (nominal p-values <0.05) for COPD, 1 for FVC, and 1 was significant (nominal p-values <0.05) for FEV1/FVC ratio.
Region 12—Chromosome 12: 8166003 bp-8182389 bp
Region 12 (see e.g., NCBI Contig Accession Numbers NW_001838051.1/GI:157696928, NT_009714.17/GI:224514867 and NW_925295.1/GI:89035948) contains 14 SNPs, 3 of which were significant (nominal p-values <0.05) for FVC at a p-value<0.05.
Region 13—Chromosome 12: 64216921 bp-64339959 bp
Region 13 (see e.g., NCBI Contig Accession Numbers NW_001838060.2/GI:157812191, NW_925395.1/GI:89036563 and NT_029419.12/GI:224514900) contains 29 SNPs, 1 of which was significant (nominal p-values <0.05) for FEV1 at a p-value<0.05.
Region 14—Chromosome 13: 72000549 bp-72000549 bp
Region 14 (see e.g., NCBI Contig Accession Numbers NT_024524.14/GI:224514830, NW_001838081.1 GI:157696958 and NW_925506.1/GI:89037138) contains 1 SNP which was not significant at a p-value<0.05.
Region 15—Chromosome 13: 85625744 bp-85747575 bp
Region 15 (see e.g., NCBI Contig Accession Numbers: NT_024524.14/GI:224514830, NW_001838083.1/GI:157696960, NW_001838084.2/GI:157812203, NW_925506.1/GI:89037138, and NW_925517.1/GI:89037217) contains 26 SNPs, 2 of which were significant (nominal p-values <0.05) for COPD, 11 of which were significant (nominal p-values <0.05) for FVC, 7 of which were significant (nominal p-values <0.05) for FEV1 and 4 for FEV1/FVC ratio.
Region 16—Chromosome 13: 102378362 bp-102465179 bp
Region 16 (see e.g., NCBI Contig Accession Numbers: NT_009952.14/GI:37544901, NW_001838084.2/GI:157812203 and NW_925517.1/GI:89037217) contains 41 SNPs, 12 of which were significant (nominal p-values <0.05) for association with FVC and 10 of which were significant (nominal p-values <0.05) for FEV1.
Region 17—Chromosome 16: 2038579 bp-2076625 bp
Region 17 (see e.g., NCBI Contig Accession Numbers: NT_010393.16/GI:224514941, NW_001838339.2/GI:157812280 and NW_926018.1/GI:89040669) contains 13 SNPs, 1 of which was significant (nominal p-values <0.05) for COPD, FVC and FEV1/FVC ratio.
Region 18—Chromosome 16: 20569262 bp-21002350 bp
Region 18 (see e.g., NCBI Contig Accession Numbers: NT_010393.16/GI:224514941, NW_001838381.1/GI:157697600 and NW_926184.1/GI:89040724) contains 112 SNPS, 1 of which was significant (nominal p-values <0.05) for COPD, 18 for FEV1 and 16 (nominal p-values <0.05) for FEV1/FVC ratio.
Region 19—Chromosome 18: 45472119 bp-45787095 bp
Region 19 (see e.g., NCBI Contig Accession Numbers: NW_001838468.1 GI:157697806, NT_010966.14/GI:224514957 and NW_927106.1/GI:89047489) contains 140 SNPs, 35 of which were significant (nominal p-values <0.05) for COPD, 15 of which were significant for FVC, 39 of which were significant (nominal p-values <0.05) for FEV1, and 45 were significant (nominal p-values <0.05) for FEV1/FVC ratio.
Table 8 provides a consolidated listing of SNPs by the region in which they are found along with the sequences of those SNPs and the polymorphism shown.
While the technology has been particularly shown and described with reference to specific illustrative embodiments, it should be understood that various changes in form and detail may be made without departing from the spirit and scope of the technology.
Unless otherwise indicated, the nucleic acids listed or set forth in Table 8 include: nucleic acids having the sequences recited in the table and/or their complement and/or both strands (e.g., as a double stranded sequence).
This application is a continuation of U.S. patent application Ser. No. 13/541,479, filed Jul. 3, 2012, which is a continuation of International Application No. PCT/US2011/021593, filed Jan. 18, 2011, which claims the benefit of U.S. Provisional Application No. 61/295,555 filed Jan. 15, 2010, the entirety of each of which applications is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61295555 | Jan 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13541479 | Jul 2012 | US |
Child | 15713462 | US | |
Parent | PCT/US11/21593 | Jan 2011 | US |
Child | 13541479 | US |