A sequence listing titled SNP.ST25.txt created on Mar. 13, 2009 having 792 Bytes and is ASCII compliant is filed herewith to satisfy 37 CFR 1.821(c). The information recorded in the electronic form is identical to the sequence listing in the application.
Gene promoter hypermethylation in sputum is a biomarker for predicting lung cancer. Identifying factors that predispose smokers to methylation of multiple gene promoters in the lung could impact strategies for early detection and chemoprevention.
Lung cancer, the leading cause of cancer mortality in both men and women in the United States, now accounts for approximately 30% of all deaths from cancer. The 5-year survival rate of lung cancer patients is about 14%. The discovery of field cancerization in the respiratory tract of smokers prompted studies leading to the discovery that inactivation of genes such as p16 bp promoter hypermethylation occurs in precursor lesions to non-small cell lung cancer. This finding suggested that methylation, when detected in exfoliated cells within sputum, could serve as a biomarker for the early stages of lung carcinogenesis.
The precise mechanisms by which carcinogens disrupt the cells' capacity to maintain the normal epigenetic code during DNA replication and repair are largely unknown. Smoking accounts for >90% of lung cancer. Carcinogens within tobacco induce single- and double-strand breaks (DSBs) in DNA. Reduced capacity for repair of DNA damage has been associated with lung cancer. DNA damage, manifested through DSBs, could in part be responsible for the acquisition of aberrant gene promoter methylation during lung carcinogenesis. For example, the prevalence of promoter methylation of the p16 gene is significantly greater in adenocarcinomas from workers occupationally exposed to plutonium, an exposure that predominantly produces DSBs, than in cancer from unexposed smokers. The prevalence of p16 methylation increased with increasing plutonium exposure. In a second study, the prevalence of methylation of the estrogen receptor-α gene promoter was greater in plutonium-induced adenocarcinomas in rodent lung tumors compared to tumors induced by NNK [4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone], diesel exhaust, or carbon black exposures which mainly induce single-strand breaks of DNA (Carcinogensis 2005; 26:1481-7).
According to one embodiment of the present invention, the health of a subject is predicted by the method comprising obtaining nucleic acid sequence data about the subject. At least one polymorphic risk marker is identified which is associated with a change in promoter methylation of a gene associated with lung cancer. For example he subject is a human. The health of the subject is predicted from a presence of at least one polymorphic risk marker identified. In a preferred embodiment obtaining a nucleic acid sequence data is obtained for one or more of the flowing genes XRCC3, DNA-PKc, NBN, LIG4, XRCC2, CHEK1, MRE11A, CHEK2, RAD50, and KU80. In a more preferred embodiment the at least one polymorphic risk marker is selected from the group consisting of: an allele A in marker rs537046 of gene CHEK1; an allele C in marker rs5762763 of gene CHEK2; an allele C in marker rs1151402 of gene LIG4; an allele T in marker rs7117042 of gene MRE11A; an allele T in marker rs6998169 of gene NBN; an allele A in marker rs7830743 of gene DNA-PKc; an allele G in marker rs2244012 of gene RAD50; an allele C in marker rs3218400 of gene XRCC2; an allele C in marker rs2295146 of gene XRCC3; and an allele A in marker rs828911 of gene KU80. In another preferred embodiment determining a risk includes identifying the presence of a five polymorphic risk markers selected from the group consisting of: an allele C in marker rs5762763 of gene CHEK2; an allele T in marker rs7117042 of gene MRE11A; an allele T in marker rs6998169 of gene NBN; an allele A in marker rs7830743 of gene DNA-PKc; and an allele C in marker rs2295146 of gene XRCC3. In another preferred embodiment a gene associated with cancer is selected from the group consisting of p16, MGMT, DAPK, RASSF1A, PAX5-α, PAX5-β, GATA4, and GATA5. In another preferred embodiment, determining the health of a subject comprises comparing the obtained nucleic acid sequence data to a database containing correlation data between polymorphic risk markers and risk factors to provide a score relating to the health of the subject. For example the presence of the five polymorphic risk markers from the group are present in 7 or more of 10 possible alleles predicts the health of the subject. In a more preferred embodiment detecting a polymorphic risk marker that is in linkage disequilibrium with one or more of the at least one polymorphic risk markers identified in claim 4. For example the polymorphic risk markers in linkage disequilibrium with a polymorphic risk marker are selected from table 7. For example, linkage disequilibrium is defined by numerical values of r.̂2 of at least 0.8.
In another embodiment in a nucleic acid sample of the subject a polymorphic risk marker is detected for one or more of the genes selected from the group consisting of XRCC3, DNA-PKc, NBN, LIG4, XRCC2, CHEK1, MRE11A, CHEK2, RAD50, and KU80. For example the nucleic acid sample comprises DNA, RNA or both. The nucleic acid sample is amplified for example by a polymerase chain reaction. During amplification the polymorphic risk marker is detected by amplification such as a polymerase chain reaction or sequencing.
In another embodiment, a kit for detecting a polymorphic risk marker associated with a change in promoter methylation of a gene comprises reagents for selectively detecting at least one allele of at least one polymorphic risk marker from XRCC3, DNA-PKc, NBN, LIG4, XRCC2, CHEK1, MRE11A, CHEK2, RAD50, and KU80 in the genome of an individual, wherein the polymorphic risk marker is selected from the group consisting of the polymorphic risk markers listed in Table 7, and markers in linkage disequilibrium therewith.
In yet another embodiment, a computer-readable medium having computer executable instructions for predicting the health of a subject at risk for developing lung cancer the computer readable medium comprising data indicative of at least one polymorphic risk marker from each gene selected from the group consisting of XRCC3, DNA-PKc, NBN, LIG4, XRCC2, CHEK1, MRE11A, CHEK2, RAD50, and KU80 and a routine stored on the computer readable medium and adapted to be executed by a processor to predict the health of a subject at risk for developing lung cancer when one or more from the at least one polymorphic risk marker from each gene selected from the group consisting of XRCC3, DNA-PKc, NBN, LIG4, XRCC2, CHEK1, MRE11A, CHEK2, RAD50, and KU80 is present in a nucleic acid sequence data obtained from a subject.
In yet another embodiment, a method of aiding in a diagnosis of a subject suspected of lung cancer, the method comprising the steps of obtaining nucleic acid sequence data about the subject. The presence of one or more polymorphic risk markers from the nucleic acid sequence data is identified. The number of polymorphic risk markers is compared to a look up table. A score is assigned based upon the number of polymorphic risk markers present. Based on one or more data points such as the score, subject health information, and/or predisposition, whether the subject has a risk of lung cancer is determined. The health of the subject is determined. In a preferred embodiment, the one or more polymorphic risk markers are selected from the group consisting of an allele A in marker rs537046 of gene CHEK1; an allele C in marker rs5762763 of gene CHEK2; an allele C in marker rs1151402 of gene LIG4; an allele T in marker rs7117042 of gene MRE11A; an allele T in marker rs6998169 of gene NBN; an allele A in marker rs7830743 of gene DNA-PKc; an allele G in marker rs2244012 of gene RAD50; an allele C in marker rs3218400 of gene XRCC2; an allele C in marker rs2295146 of gene XRCC3; and an allele A in marker rs828911 of gene KU80. The method may further include obtaining at least one biometric parameter from the subject. For example information which may be obtained from a health survey conducted by a health care provider.
In a preferred embodiment, the at least one biometric parameter includes the smoking history of the subject. In a preferred embodiment one or more of the methods disclosed herein is a computer implemented method. For example a computer implemented method for aiding in a diagnosis of a subject suspected of lung cancer.
Another aspect of the present invention provides a method of predicting the likelihood that a subject will develop lung cancer.
Yet another method of the present invention provides for identifying a subject at risk for developing lung cancer. Yet another aspect of the present invention includes diagnosis, prognosis, or monitoring a subject with the system and method disclosed herein.
Another aspect of the present invention provides for a method for evaluating a subject who has a predisposition for developing lung cancer should receive further testing
Another aspect of the present invention is a method of determining a subjects likelihood of longevity.
One aspect of the present invention provides an in vivo association between DRC and gene promoter methylation, both through a functional assay and genetic variants in genes within the double-strand break repair pathway.
Another aspect of the present invention is the identification of an activity deficit of the MRE11A gene that plays a critical role in recognition of double-strand break DNA damage and activation of the ATM gene. The mechanism underlying this association could in part be mediated by the genes that are recruited to sites of DSBs, and the resultant modification of chromatin to facilitate repair.
One aspect of the present invention provides for identification of double-strand break repair capacity (DSBRC) and specific genes within this pathway as a critical determinant for gene promoter hypermethylation.
One aspect of the present invention provides for validation of the polymorphisms as an indicator of the health of the subject and/or methylation index. Genetic variants associated with promoter hypermethylation could be used to identify young smokers who would be most susceptible to induction of preneoplasia, and thus, should receive chemoprevention. In addition, the integration of these genetic variants with detection of gene promoter hypermethylation in sputum in long-term heavy smokers will provide a diagnostic test for incident lung cancer and impact long-term survival from this fatal disease.
As used herein “a” means one or more unless otherwise defined.
As used herein, an “allele” refers to the nucleotide sequence of a given locus (position) on a chromosome. Genomic DNA from an individual contains two alleles for any given polymorphic marker, representative of each copy of the marker on each chromosome.
A “haplotype,” as described herein, refers to a segment of genomic DNA within one strand of DNA that is characterized by a specific combination of alleles arranged along the segment. For diploid organisms such as humans, a haplotype comprises one member of the pair of alleles for each polymorphic marker or locus.
The nucleotide sequence of a gene, as used herein, encompasses coding regions, referred to as exons, intervening, non-coding regions, referred to as introns, and upstream or downstream regions. Upstream or downstream regions can include regions of the gene that are transcribed but not part of an intron or exon, or regions of the gene that comprise, for example, binding sites for factors that modulate gene transcription.
The genomic sequence for the CHEKI gene is included in GenBank accession number NM001274.
The genomic sequence for the CHEK2 gene is included in GenBank accession number NM00100573, NM007194, NM145862.
The genomic sequence for the LIG4 gene is included in GenBank accession number NM002312, NM206937.
The genomic sequence for the MRE11 gene is included in GenBank accession number NM005590, NM005591.
The genomic sequence for the NMB gene is included in GenBank accession number NM001024688, NM002485.
The genomic sequence for the DNA-PKC gene is included in GenBank accession number NM006904.
The genomic sequence for the RAD50 gene is included in GenBank accession number NM005732, NM133482.
The genomic sequence for the XRCC2 gene is included in GenBank accession number NM005431.
The genomic sequence for the XR CC3 gene is included in GenBank accession number is NM005432.
The genomic sequence for the KU80 gene is included in GenBank accession number is NM021141.
As used herein “Linkage Disequilibrium” (“LD”) refers to alleles at different loci that are not associated at random. If the alleles are in positive linkage disequilibrium, then the alleles occur together more often than expected assuming statistical independence. Conversely, if the alleles are in negative linkage disequilibrium, then the alleles occur together less often than expected assuming statistical independence.
As used herein “Odds Ratio” (“OR”) refers to the ratio of the odds of the disease for individuals with the marker (allele or polymorphism) relative to the odds of the disease in individuals without the marker (allele or polymorphism).
As used herein “Single Nucleotide Polymorphism (SNP)” means a DNA sequence variation occurring when a single nucleotide—Adenine=A, Thymine=T, Cytosine=C, or Guanine=G—at a specific location in the genome differs between members of a species or between paired chromosomes in an individual. Most SNP polymorphisms have two alleles. Each individual is in this instance either homozygous for one allele of the polymorphism (i.e. both chromosomal copies of the individual have the same nucleotide at the SNP location), or the individual is heterozygous (i.e. the two sister chromosomes of the individual contain different nucleotides). The SNP nomenclature as reported herein refers to the official Reference SNP (rs) ID identification tag as assigned to each unique SNP by the dbSNP database at the National Center for Biotechnological Information (NCBI) as of Mar. 6, 2009.
Alleles for SNP markers as referred to herein refer to the bases A, C, G or T as they occur at the polymorphic site in the SNP assay employed. The person skilled in the art will however realize that by assaying or reading the opposite DNA strand, the complementary allele can in each case be measured. Thus, for a polymorphic site (polymorphic marker) characterized by an A/G polymorphism, the assay employed may be designed to specifically detect the presence of one or both of the two bases possible, i.e. A and G. Alternatively, by designing an assay that is designed to detect the opposite strand on the DNA template, the presence of the complementary bases T and C can be measured. Quantitatively (for example, in terms of relative risk), identical results would be obtained from measurement of either DNA strand (+strand or −strand).
A “polymorphic risk marker”, sometimes referred to as a “marker”, as described herein, refers to a genomic polymorphic site identified by rs number. Each polymorphic risk marker has at least two sequence variations characteristic of particular alleles at the polymorphic site (major allele and minor allele). Thus, genetic association to a polymorphic risk marker implies that there is association to at least one specific allele of that particular polymorphic risk marker. The marker can comprise any allele of any variant type found in the genome, including single nucleotide polymorphisms (SNPs). Polymorphic risk markers can be of any measurable frequency in the population. The major or the minor allele can be the polymorphic risk marker.
A “nucleic acid sample” is a sample obtained from an individual that contains nucleic acid (DNA or RNA). In certain embodiments, i.e. the detection of specific polymorphic risk markers and/or haplotypes, the nucleic acid sample comprises genomic DNA. Such a nucleic acid sample can be obtained from any source that contains genomic DNA, including as a blood sample, sample of amniotic fluid, sample of cerebrospinal fluid, or tissue sample from skin, muscle, buccal or conjunctival mucosa (buccal swab), placenta, gastrointestinal tract or other organs.
A “variant”, as described herein, refers to a segment of DNA that differs from the reference DNA. A “marker” or a “polymorphic risk marker”, as defined herein, is a variant. Alleles that differ from the reference are referred to as “variant” alleles.
A “computer-readable medium”, is an information storage medium that can be accessed by a computer using a commercially available or custom-made interface. Exemplary computer-readable media include memory (e.g., RAM, ROM, flash memory, etc.), optical storage media (e.g., CD-ROM), magnetic storage media (e.g., computer hard drives, floppy disks, etc.), punch cards, or other commercially available media. Information may be transferred between a system of interest and a medium, between computers, or between computers and the computer-readable medium for storage or access of stored information. Such transmission can be electrical, or by other available methods, such as IR links, wireless connections, etc.
D′=1 means there is no recombination between these two SNPs. R̂2 further considers the different allele frequencies between any two given SNPs. R̂2=1 means perfect linkage disequilibrium (no recombination and allele frequencies for the two SNPs are the same). From a SNP list, we can tell that the R2 for some SNP pairs whose D′ is equal to one may be less than one.
A large panel of genes was examined for their ability to predict lung cancer in a nested case-control study. According to one embodiment of the present invention, a combination of six genes was identified whose methylation in sputum predicted lung cancer prior to clinical diagnosis with both a sensitivity and specificity of 65%. According to another embodiment of the present invention one or more of the six genes were identified whose methylation in sputum predicted lung cancer prior to clinical diagnosis with both a sensitivity and specificity to be significant.
One embodiment of the present invention provides a system or method that identifies high methylation index and correlates this index with a reduced capacity to repair DSBs in a human subject. This information is useful to predict the health of the subject. In addition, sequence variation in genes from the DSB repair pathway is identified that predict for high methylation index.
One aspect of the present invention provides that double-strand break repair capacity and sequence variation in genes in this pathway are associated with a high methylation index in a cohort of current and former cancer-free smokers.
Referring now to
A 50% reduction in the mean level of double-strand break repair capacity was seen in lymphocytes from smokers with a high methylation index, defined as ≧3 of 8 genes selected from (p 16, MGMT, PAX5-α, PAX5-β, GATA4, GATA5, DAPK, RASSF1A) methylated in sputum, compared to smokers with no genes methylated. The classification accuracy for predicting risk for methylation was 88%. SNPs within the MRE11A, CHEK2, XRCC3, DNA-Pkc, and NBN DNA repair genes were highly associated with the methylation index and the health of a subject. A 14.5-fold increased odds for high methylation was seen for persons with ≧7 risk alleles out of a possible 10 alleles of these genes. Promoter activity of the MRE11A gene that plays a critical role in recognition of DNA damage and activation of Ataxia Telanqiectasia Mutated (ATM) was reduced in persons with the risk allele. This is the first population-based study to identify double-strand break DNA repair capacity and specific genes within this pathway as critical determinants for gene methylation in sputum, that is, in turn, associated with elevated risk for lung cancer and/or the health of a subject.
High Methylation Index as used herein is defined as the methylation of three or more gene-specific promoters selected from p16, MGMT, PAX5-α, PAX5-β, GATA4, GATA5, DAPK, RASSF1A detected in sputum.
Gene Methylation in Sputum. Gene promoter methylation was assessed in sputum from 824 members of the cohort, a cohort of current and former cancer-free smokers (Table 1). Methylation of an eight-gene panel that included p16, O6-methylguanine-DNA methyltransferase (MGMT), death associated protein kinase (DAPK), ras effector homolog 1 (RASSF1A), GATA4, GATA5, PAX5-α, and PAX5-β was evaluated. Methylation of these genes has been associated with increased risk for lung cancer (Cancer Res. 2006; 66: 3338-44). The prevalence of methylation ranged from 1.2% for RASSF1A to 31% for GATA4 and was not associated with family history for lung cancer (Table 5). Nineteen percent of cohort members were methylated for three or more genes (Table 5). Our previous nested case-control study within a Cohort revealed that methylation of ≧3 genes from a 6-gene panel (excluding GATA4 and PAX5-α) was associated with a 6.5-fold increased risk for lung cancer.
Repair capacity associates with methylation index. The mutagen sensitivity assay was used to assess double-strand break repair capacity (DSBRC) (Int. J. Cancer 1989; 43: 403-9). The mutagen sensitivity assay as used herein is a quantitative measurement of breaks within the 46 chromosomes of a cell following exposure to bleomycin, a radiomimetic agent that induces double-strand breaks in DNA. The greater the number of breaks, the worse the DNA repair capacity. Thus, the number of chromatid breaks induced in lymphocytes following exposure to bleomycin was used to measure DSBRC. We selected persons from our cohort who exhibited a high (cases [≧3 methylated genes]) or low (controls [zero of eight genes methylated]) methylation index because of the increased risk for lung cancer seen in nested, case-control study when 3 or more genes were methylated in sputum. Cryopreserved lymphocytes were available for assessment of DSBRC for 77 cases and 78 controls. Demographics and smoking history for cases and controls are detailed in Table 1. A highly statistically significant difference was seen in DSBRC (p<0.001) between cases and controls with a mean number of chromosome breaks per cell of 0.47±0.11 and 0.32±0.10, respectively (
We further classified the cases into three groups based on the number of methylated genes (3, 4 and ≧5 methylated genes) and found that the number of chromatid breaks per cell induced by bleomycin increased with the increasing number of methylated genes in sputum (p<0.001;
A receiver operator characteristic (ROC) curve was generated to determine how well DSBRC distinguished cases from controls. The ROC curve demonstrates that DSBRC significantly (p<0.0001) increased the classification accuracy from 66% to 88% for predicting risk for promoter methylation (
SNPs within DNA repair genes and risk for methylation. DNA repair capacity strongly predicts for high methylation index, and has high heritability. It was unexpectedly observed that variants in genes involved in repair were predictive of a change in promoter methylation for a panel of genes identified as cancer genes. Sixteen (16) candidate genes from the DSBR and cell cycle control pathways were selected for tag SNP-based genotyping (Table 2). A total of 294 SNPs were evaluated for 131 cases and 130 controls that included the subset evaluated in the mutagen sensitivity assay. Forty-four (44) SNPs identified from the 16 candidate genes identified in Table 8 were found to be associated with risk for promoter methylation (p<0.15) with adjustment for covariates. Because of the relatively high correlation between SNPs in these genes, we tested which SNP, or set of SNPs, was most significantly associated with risk for promoter methylation by using a step-wise logistic regression model. The underlined SNPs with p<0.15 from each gene (Table 8) were selected to represent the allelic status for those genes. These 16 SNPs were then included with the covariates in one model and step-wise selection was used to identify the SNPs with the lowest P-value. The minor alleles of ten SNPs from different genes were identified with 4 SNPs associated with increased risk for promoter methylation (ORs, 1.6-4.0) and 6 SNPs with reduced risk for promoter methylation (ORs, 0.4-0.7) (Table 3). Monte Carlo estimates of exact p-values were calculated by permuting the case-control status for all subjects 10,000 times. The exact p-value for five SNPs an allele C in marker rs5762763 of gene CHEK2; an allele T in marker rs7117042 of gene MRE11A; an allele T in marker rs6998169 of gene NBN; an allele A in marker rs7830743 of gene DNA-PKc; an allele C in marker rs2295146 of gene XRCC3 was <0.05 (Table 3). This result indicates that if a similar study were repeated under a null distribution (i.e., no SNPs associated with risk for change in promoter methylation), an association similar to that observed with any of these five SNPs would occur by chance <5% of the time. The underlined SNPs in Table 8 were selected from each gene (p<0.15) to represent the allelic status for those genes. Of the 294 SNPs, 42 SNPs were excluded for data analysis because they were nonpolymorphic or had a minor allele frequency of <0.05 or had a low yield (<80%) or showed a highly significant distortion from Harder Weinberg equilibrium (P<0.001).
Referring now to
ROC curves were generated to evaluate the classification accuracy of this panel of SNPs to distinguish cases from controls. The area under the curve increased from 57% (covariates only) to 72% (covariates with the 5 most significant SNPs) and to 75% (covariates with all 10 SNPs,
Reduced activity of the MRE11A promoter. The genes included in the prediction model have biological plausibility, i.e. showing prior association to cancer with respect to sequence variation or activity level. Two of the five genes (MRE11A, NBN, XR CC3, CHEK2, DNA-PKC) whose sequence variation is associated with methylation, NBN and XRCC3, have shown association with lung cancer (Lung Cancer 2005; 49:317-23 and Carcinogenesis 2006; 27: 997-1007). SNPs within the DNA-PKc and CHEK2 genes have been associated with breast and other cancers, while no studies have been conducted with MRE11A (Cancer Res. 2004; 64: 5560-3 and Hum. Mol. Genet. 2007; 16: 1051-7).
Assessment of the functional potential of the SNPs identified from our study for these genes revealed that the minor allele at rs7830743 of DNA-PKC is the polymorphic risk marker and is a nonsynonymous SNP (a SNP that changes the amino acid within a gene sequence) changing amino acid residue 3434 from Ile to Thr in exon 73. This amino acid substitution is predicted to change the secondary structure and may influence the serine/threonine protein kinase activity of this protein Structure 2005; 13: 243-55. We have shown that reduced DNA-PKc activity is associated with risk for lung cancer and sensitivity to cell killing by bleomycin (Carcinogenesis 2001; 22: 723-7), thus supporting an important role for this gene in lung cancer and aberrant gene promoter methylation. The SNPs from the other four genes (CHEK2, XRCC3, DNA-PKc, and NBN) are neither nonsynonymous or in high linkage disequilibrium with any nonsynonymous SNP with known function. However, MRE11A/rs7117042 and NBN/rs6998169 are predicted to locate in the middle of the sequence, forming DNA triplexes that could inhibit DNA transcription (Nucleic Acids Res. 2006; 34: W621-5).
To begin addressing function of these SNPs, we tested whether MRE11A/rs7117042 is associated with a reduction in promoter activity. Referring now to
MRE11A has a role in recognition of double-strand break damage. It complexes with Rad50 and Nbs1 to directly sense the double-strand breaks, binds to the DNA, modifies the ends via 3′ to 5′ exonuclease activity, recruits ATM to the damaged DNA template, and dissociates the ATM dimer (Science 2005; 308: 551-4). Therefore, a reduction in level of the MRE11A protein could have a major impact on DSBRC.
These results indicate a strong link and correlation between reduced DSBRC and risk for methylation in sputum and overall health of a subject.
Another aspect of the present invention provides a method for determining DNA damage that has long been recognized as an initiating event for mutagenesis, or an initiator for initiating aberrant promoter hypermethylation and/or health of a subject.
Study population and sample collection. A Smokers Cohort (n=1860) was established in 2001 to conduct longitudinal studies on molecular markers of respiratory carcinogenesis in biological fluids such as sputum from people at risk for lung cancer. At enrollment, individual information about medical, family, and smoking, exposure history, and quality of life was collected through a computer-based system. Induced sputum and blood were collected and pulmonary function testing was performed. Blood was processed within 2 h after blood draw to isolate lymphocytes and plasma. Cryopreservation of lymphocytes began in 2005.
Cytologically adequate sputum samples from 824 cohort subjects were evaluated for gene promoter methylation for a panel of genes comprising (p16, MGMT, DAPK, RASSF1A, PAX5-α, PAX5-β, GATA4 and GATA5). High methylation index was defined as the methylation of three or more gene-specific promoters in sputum. We selected persons from our cohort that exhibited a high (cases) or low (controls [0 of 8 genes]) methylation index. To increase the stringency for case selection, GATA4, which was most commonly methylated in sputum, was excluded as one of the three methylated genes needed for case classification and 131 of 824 cohort subjects met this criteria. Cases were frequency matched by gender to controls. Cases (n=131) and controls (n=130) were selected for the genetic association study. Among the 131 cases, 77 had adequate number of cryopreserved lymphocytes for the mutagen sensitivity assay. Seventy-eight controls were selected from the 130 controls, with frequency matching by gender maintained, for the mutagen sensitivity assay.
Sputum cytology and nested methylation-specific PCR. Sputum samples were stored in Saccomanno's fixative. Three slides were made for each sputum sample to check for adequacy defined as the presence of deep lung macrophages or Curschmann's spiral Diagnostic Pulmonary Cytology 2nd ed. Chicago: Amer. Society of Clinical Pathologists; 1986. The methylation specific PCR assay was only performed on cytologically adequate sputum samples. Eight genes (p16, MGMT, DAPK, RASSF1A, PAX5-α, PAX5-β, GATA4 and GATA5) were selected for analysis of methylation in sputum based on our previous studies establishing their association with risk for lung cancer. Nested MSP was used to detect methylated alleles in DNA recovered from the sputum samples as described (Cancer Res. 2006; 66: 3338-44).
Evaluation of double-strand break repair capacity (DSBRC) in peripheral lymphocytes. PHA-stimulated lymphocytes were treated with bleomycin to evaluate the generation of chromosome aberrations as an index of DSBRC (Int. J. Cancer 1989; 43:403-9). Briefly, cryopreserved lymphocytes were thawed and cultured in RPMI1640 medium supplemented with FBS (20%) and PHA (1.5%) at a cell density of <0.5×106/ml. Sixty-seven hours after PHA stimulation, the culture was split into two T25 flasks and treated with bleomycin or vehicle for 5 h. The final concentration for bleomycin in culture medium was 3 U/l, a concentration defined through dose-response studies using isolated lymphocytes from cohort subjects and two lymphoblastoid cells lines: GM02782 (mutant ATM) and GM00131 (wild-type ATM) (data not shown). The dose selected was within the linear dose-response range and caused obvious genotoxicity, but minimal cytotoxicity. One hour before harvest, colcemid was added to the cultures at a final concentration of 0.06 mg/ml. Slides were prepared according to conventional procedures and 100 well-spread metaphases were examined for chromatid breaks. Samples were assayed as a batch, and slides were scored by a person blinded to case-control status. The criteria of Hsu et al. were used to record the aberrations: a chromatid break was scored as one break and each isochromatid break set was scored as two breaks. Chromosome/chromatid gaps, chromosome-type aberrations (dicentrics, ring, and acentric fragments) or chromatid exchanges were recorded, but not added to the frequencies of chromatid breaks. On rare occasions, a metaphase with >12 breaks was observed on a slide with bleomycin treatment. When this occurred, the number breaks was recorded as 12. The DSBRC was expressed as the mean number of chromatid breaks per cell.
The means of spontaneous chromatid breaks per cell derived from 100 metaphases of untreated cells were 0.013 in cases and 0.021 in controls, which were similar to the spontaneous frequency reported in the literature and < 1/15 the mean number of breaks seen in bleomycin-treated cells (0.32). Therefore, for statistical comparisons, the spontaneous breaks were not subtracted from the breaks observed following treatment with bleomycin.
SNP selection and genotyping by illumina platform for 16 genes in the double-strand break repair pathway and related cell cycle control genes. A total of 294 SNPs were selected for 16 candidate genes from DSBR and cell cycle control pathways Table 9 (Am. J. Hum. Genet. 2004; 74: 106-20 and Bioinformatics 2005; 21: 263-5. Tag SNPs (n=245) were derived from Latino and White data from University of Southern California plus phase 1 HapMap for whites for 15 genes. Tag SNPS were selected using r2 ≧0.8 with nonsynonymous SNPs retained as the tag SNPs (Am. J. Hum. Genet. 2004; 74: 106-20). One additional SNP for bins with at least six or more SNPs was selected as a redundant SNP in case of genotyping failure. For the remaining gene, NBN, 49 SNPs were selected using dbSNPs based on a SNP density of 1-3 SNPs/kb depending on the haplotype block structure, validation status, Illumina design score, and functional potential of the SNPs. The number of SNPs selected for each of these 16 genes is shown in Table 2 and Table 9. These SNPs were genotyped by the IIlumina Golden Gate Assay for 261 DNA samples isolated from lymphocytes of cases and controls.
Selection of subjects and construction of MRE11A promoter constructs. Five common haplotypes (6-34%) were constructed based on the 14 tag SNPs assayed for MRE11A in the population (Bioinformatics 2005; 21: 263-5). A Bayesian statistical method implemented in the program PHASE (Version 2.1) was used to reconstruct the haplotypes from the SNPs in the MRE11A gene for the 261 subjects. Two subjects homozygous for the haplotype that contained the RS7117042 SNP associated with high methylation index were selected. The other four people selected were each homozygous for one of the other four haplotypes. The MRE11 promoter fragment (−2541 to −5 with +1 being the translational start site) was amplified from lymphocyte DNA from these six subjects. The promoter fragment was directionally subcloned into the pGL2-basic Luciferase Reporter Vector (Promega, Madison, Wis.) upstream of the luciferase coding sequence. Five clones from each person were commercially sequenced to identify variants within the promoter region (Sequetech, Mountain View, Calif.).
Transient transfection and reporter gene assays. The Calu 6 lung tumor-derived cell line was used for transient transfections. Cells (1.5×105) were plated into 6-well dishes and transfected the following day. Plasmid DNA (1 μg) and the pSV-β Galactosidase control vector (0.5 μg, Promega) were co-transfected into cells with FuGENE 6 transfection reagent (ROCHE Diagnostics, Indianapolis, Ind.) at a FuGENE:DNA ratio of 3:1. A promoter-less pGL2-basic vector and the pGL2-control vector that contains the SV40 promoter were used as negative and positive controls, respectively. Forty-eight hours after transfection, cells were harvested and lysed. Immediately after lysing, cell extracts were assayed in a luminometer for luciferase activity using the Lumionskan Ascent luminometer (Thermo Electron, Milford, Mass.) for luciferase activity using the Luciferase Assay System (Promega). β-galactosidase activity in cell lysates was measured using the Galacto-Star Reporter Gene Assay System (Tropix, Bedford, Mass.). Promoter activity was calculated as the ratio of activities of luciferase and β-galactosidase. Transfections were done in duplicate in four independent experiments.
Statistical analysis. The two-sample t-test, Wilcoxon rank sum test, and x2 test were employed to compare the mean or distribution of several demographic variables and DSBRC results between cases and controls as appropriate. Because the DSBRC data and the number of spontaneous breaks were not normally distributed, analysis was also performed on log-transformed data. The results based on log-transformed data were similar to those based on untransformed data so only results based on untransformed data are shown. Analysis of covariance and logistic regression were used to assess the association between selected variables such as SNPs and case-control status, and the outcome variable, DSBRC with adjustment of covariates selected a priori (age at sputum collection, sex, race, current smoking status, and pack years). DSBRC was dichotomized for logistic regression models using the upper quartile of DSBRC in control participants. The selection of the upper quartile of DSBRC in controls as the cut-off value was based on the distribution of DSBRC in cases and controls. Analysis of covariance and logistic regression models, stratified by status were also examined for different associations between SNPs and DSBRC by case-control status. A ROC curve was also drawn to compare the sensitivity and specificity of DSBRC induced by bleomycin for classifying cases (Radiology 1982; 143: 29-36). Multivariate unconditional logistic regression assessed the association between SNPs and the outcome of case-control status, with the same covariates outlined above. Model results are presented as ORs with 95% CIs for having ≧3 methylated genes. Logistic regression modeling was extended to generalized logit models to more precisely examine the high methylation index. ORs and 95% CIs for the risk of having 3, 4 or ≧5 methylated genes with 0 methylated genes as the reference group was obtained with adjustment for the same covariates.
The call rate for each SNP was assessed prior to data analysis. For the 294 SNPs assayed, 42 were deemed unsuitable because they were monomorphic, had MAF<0.05, low yield (<80%), or showed a highly significant distortion from Hardy-Weinberg equilibrium (p<0.0001). These SNPs were removed from analysis. The remaining 252 were analyzed (Table 8). Four models were tested: co-dominant, dominant, additive, and recessive. Because of power limitations, only results for the additive model are presented for each SNP, and common homozygote, heterozygote, and rare homozygote were coded as 0, 1, and 2, respectively. A logistic regression model was used to calculate the ORs and 95% CIs for each individual SNP with adjustment for age, sex, ethnicity, and smoking selected a priori. A ROC curve was drawn to evaluate the classification accuracy of this panel of variables for promoter methylation. An analysis excluding the 23% of study subjects that were not of non-Hispanic white origin had no effect on the identified associations. Therefore, all 261 subjects were included in the data analysis.
Monte Carlo estimates of exact p-values were calculated by permuting the case-control status for all subjects 10,000 times to adjust for multi-comparisons. False positive report probability (FPRP) was also calculated to address the robustness of our findings for individual SNPs (J. Natl. Cancer Inst. 2004; 96: 434-42). In assigning a prior probability for these genes, we considered the strong association between DSBRC and risk for promoter methylation and the stringent r2 value (0.8) for selecting tag SNPs. On the basis of the evidence for associations between SNPs in CHEK2, XRCC3, DNA-PKc, NBN, LIG4, and XRCC2 and several cancers, we assigned a relatively high prior probability range (0.1-0.25) for SNPs of these six genes. In contrast, for MRE11A, Ku80, RAD50, and CHEK1, a relatively low prior probability range (0.01-0.1) was assigned because there are no studies that have addressed the association of variants within these genes to cancer. All data analyses were performed with SAS/STAT and SAS/GENETICS 9.1.3.
The expanded regions (100 kb upstream and downstream and coding region) of the top 10 genes downloaded from HapMap project were checked for the SNPs in high LD (r̂2>=0.8) with the risk/protective SNPs reported in Cancer Research 2006; 66: 3338-44. For XRCC2, rs3218400 was not genotyped in HapMap, however, it is in prefect LD (r̂2=1) with rs3218438 in NIEHS EGP project. rs3218438 was genotyped in HapMap. Therefore, both NIEHS and HapMap projects were checked for SNPs in high LD with rs3218400. For the rest of genes in the list, only HapMap database was searched either because small regions were resequenced in NIEHS for some of these genes or unknown ethnicity of population studied. R̂2 reflects the minimal percent agreement between SNPs for linkage disequilibrium.
The present invention has been described in terms of preferred embodiments, however, it will be appreciated that various modifications and improvements may be made to the described embodiments without departing from the scope of the invention. The entire disclosures of all references, applications, patents, and publications cited above and/or in the attachments, and of the corresponding application(s), are hereby incorporated by reference.
rs7913426
rs7117042
rs1801516
rs537046
rs1151402
rs2295146
rs828911
rs5762763
rs132793
rs10804682
rs2244012
rs3218400
rs7830743
rs10091017
rs14448
rs6998169
This application claims priority to and the benefit of the filing of U.S. Provisional Patent Application Ser. No. 61/037,052, entitled SYSTEM AND METHOD FOR DETERMINING THE HEALTH OF A SUBJECT USING DNA DOUBLE STAND BREAK REPAIR AND GENE METHYLATION POLYMORPHIC RISK MARKERS, filed on Mar. 17, 2008 and the specification and claims thereof are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US09/37248 | 3/16/2009 | WO | 00 | 9/14/2010 |
Number | Date | Country | |
---|---|---|---|
61037052 | Mar 2008 | US |