The present invention relates in general to the field of cancer detection, and more particularly, to methods for detecting a predisposition to cancer as a result of microsatellite instability at the estrogen receptor-related gamma gene (ESRRG).
Without limiting the scope of the invention, its background is described in connection with cancer detection.
Excluding skin cancers, about 1.5 million new cancer cases occur each year in the United States and approximately 560,000 cancer-related deaths1. Two major findings have changed the paradigm of cancer research and emphasized the need for molecular profiling of cancer: the discovery of predictive protein markers and genomic alterations in primary cancers2-4 and the development of targeting drugs, such as trastuzumab5,6 and the oral tyrosine kinase inhibitor, Lapitinib, that can induce remissions in HER-2 positive breast cancer patients with recurrent cancer7,8 and also decrease recurrences when used as an adjuvant therapy9.
While the complete etiology of epithelial-derived cancers is not yet known, several correlative genetic and environmental factors have been identified. One specific class of genetic events receiving increasing attention as both a marker and contributing factor of oncogenesis is microsatellite length mutations10,11. Microsatellite repeats are ubiquitous and frequently polymorphic at rates that far exceed typical single-nucleotide mutation rates12 in mammalian genomes, and their polymorphism can generate significant phenotype variation13-15. Somatic microsatellite length mutations are commonly observed in colorectal, endometrial, breast, and gastric carcinomas, and are a common feature of some lung cancers10,16,17. Microsatellite instability (MSI), defined as extreme hypervariability of microsatellites throughout the genome, has been shown to be a manifestation of defects in DNA mismatch repair genes18. We hypothesize that both somatic and germ line microsatellite mutations may play an important etiological role in the development and progression of some cancers. It is critical to have knowledge of their mutational frequency, complexity, and diversity among different types of epithelial-derived cancers, as well as an understanding of how they vary in different normal genetic backgrounds.
The present invention includes methods and kits for the detection of cancer. The invention can use a a custom oligonucleotide array to measure global microsatellite content (hybridization intensities representing the summation of all individual simple repeat-containing loci) among individual genomic DNA samples. Using this novel array, a unique and reproducible pattern of 26 differential microsatellites that specifically characterized breast cancer, colon cancer, and childhood hepatoblastoma patient germ lines was found. This same microsatellite hybridization intensity pattern was also detected in the tumor DNA of these same cancer patients, but not in DNA samples from healthy volunteers. These results indicate that some cancer patients might possess variable microsatellites that are predictive of future cancer development. Based on subsequent evaluation of individual loci containing array-identified differential motifs, we sequenced the 5′ UTR of the estrogen-related receptor gamma gene in ˜450 patient and volunteer samples and identified 5 to 21 copies of the (AAAG)n repeat that was statistically significant for differentiating the germ lines of breast cancer patients from those of healthy volunteers. Our results indicate that microsatellite instability is complex, pervasive, and an antecedent to oncogenesis.
In one embodiment, the present invention includes a method of identifying an increase in microsatellite DNA from a genomic nucleic acid sample comprising: obtaining a microsatellite profile from a sample suspected of comprising cancer cells; comparing the microsatellite profile to a reference microsatellite profile from a reference genome; and determining in increase in the number of microsatellite DNAs from the sample as compared to the reference genome, wherein an increase in microsatellite DNA indicates a pre-disposition to cancer and the microsatellites are upstream from the estrogen receptor-related gamma gene (ESRRG). In one aspect, the microsatellite is TTTC and its copy number is elevated in the sample. In another aspect, the sample is from a patient suspected of having a pre-disposition to breast, colon or lung cancer.
In another embodiment, the present invention is a method of detecting exposure of cells to carcinogens or mutagens comprising: obtaining a microsatellite profile from a genomic nucleic acid from a cell sample suspected of exposure to the carcinogen or mutagen; comparing the microsatellite profile of the cell sample to a reference cellular microsatellite profile normal cell sample; and determining an change in the number of microsatellite DNAs from the cell sample as compared to the normal cell sample, wherein an change in microsatellite DNA indicates exposure to the carcinogen or mutagen. In another aspect, the cell sample is a clinical sample. In another aspect, the microsatellite profile is obtained using a microarray that comprises at least 3, 5, 7, 10, 12, 15, 18, 20, 22 or 25, spots selected from TTTC, ACCTGA, AAAGAC; AATTT; AATT; AATTAG; ATAATT; AAATTT; AAATTG; AAAATT; ACATTT; AAAACG; AAAACT; ACTTAC; AAAAAT; AAAAGT; AAT; AAAGTT; ATATA; AAATAT; AAAGAT; AATAAG; AATAGG; AAATAG; AAAATG; AACCTT; AATATT; AAAGGT; and AAAG. In another aspect, the method further comprises the step of knocking-down or knocking-out one or more genes in the cell sample and determining the change in microsatellite profile to identity one or more microsatellite sequences and the one or more genes that are adjacent to the change in microsatellite copy number to identify a suspected link between the microsatellite copy number and the one or more genes. In another aspect, a change in the copy number of the ACCTGA microsatellite is indicative of exposure to a carcinogen or mutagen.
Yet another aspect of the present invention includes a method of identifying a microsatellite associated with a disease condition from a sample comprising: determining whether one or more microsatellite sequences from the sample has increased upstream from the ESRRG as compared to the reference genome that comprise a change in the copy number of the microsatellite sequence. In another aspect, the method further comprises the step of knocking-down or knocking-out one or more genes in the cell sample and determining the change in microsatellite profile to identity one or more microsatellite sequences and the one or more genes that are adjacent to the change in microsatellite copy number to identify a suspected link between the microsatellite copy number and the one or more genes.
In yet another embodiment, the invention includes a method of identifying a patient with a predisposition to cancer comprising: determining if there is an increase or decrease in microsatellite copy number upstream of the AAAG tandem repeat locus located in the 5′ UTR of the estrogen-related receptor gamma gene (ESRRG) in a patient sample, the patient having the disease condition, wherein an change in microsatellite copy-number indicates a pre-disposition to cancer.
In yet another embodiment, the invention includes a method of identifying the phylogeny of a sample comprising: obtaining a microsatellite profile for the sample using a microarray that comprises 1-mers to 6-mers of: perfect repeats, single mismatches, double mismatches and single nucleotide deletions; comparing the microsatellite profile to a microsatellite profile from a reference genome; and determining the phylogeny of the sample based on a comparison of the microsatellite profile of the sample to the reference genome. IN one aspect, the sample is an unknown animal sample. In another aspect, the sample is a forensic sample.
Yet another embodiment of the invention is a nucleic acid microarray for the detection of microsatellites in a genome comprising: a substrate; and a plurality of groups of sample spots arranged in a two-dimensional array, wherein the plurality of sample spots formed in a predetermined positional relationship with each other, wherein the sample spots comprise 1-mers to 6-mers of: perfect repeats, single mismatches, double mismatches and single nucleotide deletion spots. In one aspect, the microarray comprises at least two 3- to 6-mers selected from AAAGAC; AATTT; AATT; AATTAG; ATAATT; AAATTT; AAATTG; AAAATT; ACATTT; AAAACG; AAAACT; ACTTAC; AAAAAT; AAAAGT; AAT; AAAGTT; ATATA; AAATAT; AAAGAT; AATAAG; AATAGG; AAATAG; AAAATG; AACCTT; AATATT; AAAGGT; and AAAG. In another aspect, the microarray comprises 53,735 unique probes. In another aspect, each of the probes is replicated three to seven times. In another aspect, the microarray further comprises all known transcription factor binding sites, ultra-conserved sequences, positive and negative controls. In another aspect, the array comprises at least 1,000 different oligonucleotides attached to the first surface of the substrate. In another aspect, the array comprises at least 10,000 different oligonucleotides attached to the first surface of the substrate. In another aspect, the microarray comprises at least 3, 5, 7, 10, 12, 15, 18, 20, 22 or 25, spots selected from AAAGAC; AATTT; AATT; AATTAG; ATAATT; AAATTT; AAATTG; AAAATT; ACATTT; AAAACG; AAAACT; ACTTAC; AAAAAT; AAAAGT; AAT; AAAGTT; ATATA; AAATAT; AAAGAT; AATAAG; AATAGG; AAATAG; AAAATG; AACCTT; AATATT; AAAGGT; and AAAG. In another aspect, the solid phase support is made of material selected from the group consisting of glass, plastics, synthetic polymers, ceramic and nylon.
The present invention also includes an array for identifying an increase in microsatellites in a polynucleotide sample from a patient suspected of having cancer, the array comprising: a substrate; and a plurality of groups of sample spots arranged in a two-dimensional array, wherein the plurality of sample spots formed in a predetermined positional relationship with each other, wherein the sample spots comprise 1-mers to 6-mers of: perfect repeats, single mismatches, double mismatches and single nucleotide deletion spots, the array comprising two or more microsatellite spots comprising AAAGAC; AATTT; AATT; AATTAG; ATAATT; AAATTT; AAATTG; AAAATT; ACATTT; AAAACG; AAAACT; ACTTAC; AAAAAT; AAAAGT; AAT; AAAGTT; ATATA; AAATAT; AAAGAT; AATAAG; AATAGG; AAATAG; AAAATG; AACCTT; AATATT; AAAGGT; and AAAG.
Another embodiment is a kit for identifying microsatellite variations in polynucleotide sample as compared to at least one reference sample, comprising: a substrate; and a plurality of groups of sample spots arranged in a two-dimensional array, wherein the plurality of sample spots formed in a predetermined positional relationship with each other, wherein the sample spots comprise 1-mers to 6-mers of: perfect repeats, single mismatches, double mismatches and single nucleotide deletion spots; reagents suitable for a labeling of the polynucleotide sample; and reagents for binding the labeled sample to the array.
Another embodiment is a method of identifying a microsatellite DNA that correlated with a disease condition comprising: obtaining a microsatellite profile from a genomic nucleic acid from a patient sample, the patient having the disease condition; comparing the microsatellite profile of the patient to a reference microsatellite profile that is obtained from a normal sample for a person that does not have the disease condition; and determining an change in the number of microsatellite DNAs from the patient sample as compared to the normal sample, wherein an change in microsatellite DNA indicates a pre-disposition to the disease.
For a more complete understanding of the features and advantages of the present invention, reference is now made to the detailed description of the invention along with the accompanying figures and in which:
While the making and using of various embodiments of the present invention are discussed in detail below, it should be appreciated that the present invention provides many applicable inventive concepts that can be embodied in a wide variety of specific contexts. The specific embodiments discussed herein are merely illustrative of specific ways to make and use the invention and do not delimit the scope of the invention.
To facilitate the understanding of this invention, a number of terms are defined below. Terms defined herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a”, “an” and “the” are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.
Microsatellites are typically defined as tandemly repeated sequences (motifs) of one to six nucleotides that are very widely distributed throughout the genome and are frequently variable in the number of times the motif is repeated. Microsatellite alterations occur in most tumors, but their frequency and spectra are variable, with certain types of tumors (e.g., hereditary non-polyposis colorectal cancers) harboring significantly elevated rates of mutation at these loci19. The recurrence of microsatellite mutations in several loci in multiple different cancers, including known tumor suppressor genes (e.g. PTEN), is strong evidence that these microsatellite mutations are indeed important events in the progression of these cancers. Even stronger evidence lies in the observation that there is likely some selection for these specific mutations, because microsatellite mutations in other loci with similar repeat sequences are not observed in these tumors20. Alterations in repeat unit number in and around coding sequences can have important quantitative and qualitative effects on gene expression21-24 and thus could potentially contribute directly to cancer progression. Elucidation of the nature and cause of microsatellite mutations in cancer and how they are distinct from those operating in the germ line can provide critical insights into the molecular underpinnings of the oncogenetic process. Furthermore, an investigation of global microsatellite differences in various cancers might provide cancer-specific signatures, as well as help identify individual cancer biomarkers.
To investigate microsatellites on a global scale, our laboratory designed a custom array that measures genomic microsatellite content, similar to a comparative genomic hybridization array (aCGH). The array probe design was based on computationally-derived simple repeat DNA sequences (i.e. all possible 1- to 6-mer microsatellite motif combinations, including every cyclic permutation and corresponding complement sequence), not on unique sequences derived from any specific genome. Unlike aCGH array recorded hybridization intensities that are used to estimate copy variations at specific positions within the genome, the global microsatellite array is used to directly compare intensity values that represent the summation across all individual microsatellite motif-containing loci. For example, the intensity recorded on the probe for the AATT motif (and probes for its cyclic permutations, ATTT, TTTA, and TTAA) measures the contributions from the 886 AATT motif specific microsatellite loci spread throughout the reference human genome. The global microsatellite array can therefore be used to specifically and accurately measure significant motif-specific variations (polymorphisms), whether they are in the germ line or arise as somatic mutations, in any DNA sample. This allowed us to perform, for the first time, a thorough and unbiased analysis of cancer genome microsatellites, which led to the discovery that germ line microsatellite variability might represent a cancer predisposition biomarker.
Global microsatellite content distinguishes three different cancer types. Genomic DNA samples were acquired from 6 cancer-free volunteers (blood), 5 patients with expression microarray-confirmed25 basal-type breast cancer (breast tissue and blood), 5 patients with luminal-type breast cancer (breast tissue and blood), 3 colon cancer patients (colon tissue and blood or unaffected tissue), 3 children with hepatoblastoma tumors (liver tissue and blood), 3 pairs of breast cancer and matching blood cell lines, 3 pairs of lung cancer and matching blood cell lines, and 3 colon cancer cell lines (Table 2). Each of these 53 genomic DNA samples was subsequently co-hybridized with the same human DNA standard (derived from a mixed population of male and female donors) to a custom oligonucleotide array that measures summated global microsatellite content. After verification of data quality, statistical analyses were performed, and only those motifs with signals that were reproducible for replicate sequences and also biological replicates were considered in further analyses. Statistical significance (one-way ANOVA, with Benjamini & Hochberg corrected p value <0.05) was required for each differential motif, and consistency for cyclic permutations was additionally required in order to consider each differential motif as robust.
Sample acquisition and preparation: Genomic DNA was extracted from blood samples collected from volunteers (Tables 2 and 7) by the McDermott Center for Human Growth and Development Genetics Clinical Laboratory in accordance with Institutional Review Board (UTSW IRB#1287-355). Most cell lines were provided by Drs. Girard, Minna, and Boothman. Patient samples were provided by Drs. Perou, Tomlinson, Lewis, and the UTSW Tissue Repository, with each institution's review board approval. All other genomic DNA was purchased from Coriell Cell Repositories (Camden, N.J.) or American Type Culture Collection (Manassas, Va.).
To measure array specificity, a custom 70-mer oligonucleotide (SEQ ID NO.: 1) (5′-GCAAAGGGACCCACGGTGGAACAGGAGCAGGAGCAGGAGCGGGAGGGGCAGGAGCAGGAG-3′) and its complement were designed based on the GAGCAG repeat-containing EBV sequence. The custom 70-mers were de-salted, annealed, and PAGE-purified by the manufacturer (Integrated DNA Technologies, Coralville Iowa), and 500 pmoles was spiked into a cancer-free volunteer DNA sample (N4, Table 2).
Array design, manufacture, and processing: Each array consisted of 53,735 unique probes, each replicated 7 times (for a total of 376,145 probes/features) at different positions across the array, including 14,634 probes to measure repetitive DNA sequences for all possible 1-mers to 6-mers (5,356 perfect repeats (WT), single (SM) and double (DM) mismatches and single nucleotide deletion (DEL) probes). Also included on the array were all known transcription factor binding sites (2005 Transfac database), ultra-conserved sequences45, RepBase sequences (Genetic Information Research Institute, 2005, www.girinst.org) and a series of controls. A database containing all raw array data from these experiments and a text file of the corresponding probe identifiers and sequences are available for download at http://discovery.swmed.edu/gmc.
All arrays were manufactured by Roche NimbleGen (Madison, Wis.) following their standard production methods for maskless photolithography, including additional internal controls. DNA (˜1 μg, 250 ng/μl) labeling, hybridization, and scanning were performed following their aCGH standard protocol. All test samples (labeled with Cy3) were co-hybridized with Cy-5-labeled Promega (Madison, Wis.) human reference DNA, and raw intensity values were provided via CD.
Array data processing and statistical analysis: Background subtraction and quantile normalization was performed across all arrays using NimbleScan software (Roche NimbleGen), followed by regression analysis to compare all reference sample signal intensity values (R2=0.93±0.06). To reduce the potential effect of outliers, only the median 5 probe values were considered for further analysis (i.e., maximum and minimum values were discarded for each set of replicate probes on each array). GeneSpring was used to perform additional normalization (percentile shift and baseline transformation), pairwise comparisons and one-way ANOVA with Benjamini & Hochberg (B-H) correction. For microsatellite motifs, any observed difference (≧2-fold, B-H. p value ≦0.05) was also expected to occur consistently across all possible cyclic permutations. Control probes were used to gauge background levels, reproducibility of reference samples, and final statistical output. As expected, the intensity values decreased predictably between microsatellite-specific control (WT, SM, DM, and DEL) probes (
Computation of probe occurrences in genomes: Each of the 5,356 microsatellite probes on the array was also computationally aligned to the published human reference genome (NCBI Build Number 36, Version 3, Human Genome Sequencing Consortium release 4, Mar. 24, 2008). A Perl script was written to search for all 1-mer through 6-mer microsatellite motifs (minimum length of 18 bp). These microsatellites were loaded into a MySQL database and subsequently aligned to all exons, introns, and promoter regions (defined here as 1 kb 5′ of the start site) of the human genome to determine the number of occurrences in each of these regions of importance. The genetic regions were constructed by downloading the human Gene and Gene Prediction Tracks RefSeq table, March 2006 assembly, from the UCSC Genome Table Browser (genome.ucsc.edu).
All microsatellite occurrences were also aligned to the nearest SNP-associated comparative genomic hybridization value, as obtained from Illumina 109K SNP array (Illumina Inc., San Diego, Calif.) data for 10 breast cancer patients (Table 2) to determine the contribution of copy number variations to global microsatellite content. Global gain/loss in copy number, estimated as the average signal amplification ratio (tumor vs normal, diploid DNA) for all SNPs associated with each individual microsatellite locus compared to the number present in the reference genome, was negligible (˜2.6% variation on average) for microsatellite motifs determined to be differential using the custom microsatellite array.
Genotyping: Forward (SEQ ID NO.: 2) (5′ ACCTAGGAGATAGAGGTTGC 3′) and reverse (SEQ ID NO.: 3) (5′ CTTCTTCTGCACTATCAGGG 3′) primers were designed to amplify a 369 by length fragment of the ERR-γ gene including the 5′UTR AAAG repetitive sequence. PCR was performed using Promega 2×PCR Master Mix (Promega) per manufacturer instructions. Products were gel-purified using Qiagen gel extraction kit (Qiagen, Valencia, Calif.) and sequenced by the McDermott Center Sequencing Core Facility. Hardy-Weinberg equilibrium was tested using X2 test of goodness of fit, with 1 degree of freedom, checking for long and short allele distribution (where “long” is defined as 13+ copies of the AAAG motif, and “short” is defined as fewer than 13 copies). Microsatellite instability (MSI) status was performed by McDermott Sequencing Core using the Promega MSI Analysis System, Version 1.2 (Table 3). MSI status was assigned according to the Bethesda Guidelines46,47. To identify putative transcription factors, the AAAG-containing region of ERR-γ, including 100 bp flanking sequences, was searched against the Transfac database using BLAST, MATCH, and TFSEARCH tools48.
One motif, a GAGCAG repeat, was reproducibly observed as differential between cancer cell lines, which were spontaneously immortalized, and the matching B lymphocyte lines established through Epstein-Barr virus (EBV) transformation. The EBV virus contains a copy of this repeat, and to confirm that the array was specifically detecting the contaminating EBV epigenome, we compared DNA extracted directly from B lymphocytes and from a matching EBV-transformed cell line we established for two ‘normal’ samples. As shown in
We next analyzed the various cancer patient and cancer-free volunteer samples, individually and in groups for statistical purposes. Based on analysis of the germ lines of 6 cancer-free volunteers (3 men and 3 women) versus 10 breast cancer patients (all women), there were 26 statistically significant microsatellite motifs (including cyclic permutations) that consistently differed between each cancer-free volunteer and all ten patient samples (
Notably, very little difference was detected between the tumor DNA and matching germ liens of these same breast cancer patients when directly compared (
Examination of 3 colon cancer patients yielded similar results to what was observed for breast cancer patients, with a distinctive global microsatellite signature apparent between cancer patients and cancer-free volunteers. Specifically, all 26 motifs identified in breast cancer patients were also statistically significant (B-H p value ≦0.05, fold-change ≧0.05) and reproducible among colon cancer patient germ lines when compared to cancer-free volunteers (
We next evaluated hepatoblastoma tumors from children, which should have a dominant genetic component given their early development, and found a global microsatellite pattern identical to what was observed in breast cancer patients (
One-way ANOVA analysis of all samples followed by hierarchical clustering confirmed that a global microsatellite signature accurately separated all primary tumors from healthy volunteers samples (
To determine if the increased incidence of microsatellites in cancer samples relative to cancer-free volunteers was a function of copy number changes in the genomic content, we analyzed whole genomic SNP array data on the twenty breast cancer patients for differences in regions containing microsatellites. The gains and losses for each microsatellite at each locus were calculated for each sample and subsequently compared. Based on this analysis, differences in variations in global microsatellite content as ascertained by the custom microsatellite array was not due to large gains or losses of chromosomal content. The contribution of segmental chromosomal duplications to the global microsatellite signature detected in breast cancer samples (compared to normal reference DNA) was negligible (less than 3% for all differential microsatellite motifs).
Identification of a putative predisposition biomarker for breast cancer and colorectal neoplasia: Based on the published human reference genomic sequence, the 26 cancer signature motifs are associated with a total of 42,702 loci, 27,578 of which are in close proximity (i.e., within 1,000 bp) to gene coding regions (Table 4). Although not included in the canonical set of 26 cancer-specific microsatellites, we chose the statistically significant but moderately differential AAAG motif to further investigate, due to smaller repeat unit size, which is an indication of a higher likelihood for polymorphism, its prevalence in the genome, and the number of genes that harbor the AAAG motif that are also implicated in cancer. For this motif, we found 14,311 copies in the entire genome, 4,127 of which are located within genes (exons, introns, UTRs, upstream and downstream areas). When limited to the 7,183 “cancer” genes (defined as those genes found in NCBI's EntrezGene using the search terms “cancer” and “tumor”), we found 128 in the 5′ UTR and 27 in the promoter region, which we defined as 1 kb upstream of those genes.
We prioritized each AAAG locus by copy number, which is positively correlated with a higher likelihood of being polymorphic29 and subsequently designed and tested 28 PCR primer sets against a panel of 42 samples that included 12 cancer-free volunteers, 6 human diversity samples, 17 cancer cell lines, and a variety of controls. We found 11 of these loci to be polymorphic (i.e., 10 that exhibit different sizes and one that is frequently deleted) in the human samples (data not shown). Of the 11 polymorphic markers, two were of particular interest. One of the two markers containing an AAAG repeat, found in the TBL1Y gene located on the Y chromosome was absent in all female samples (data not shown). However, this microsatellite was also absent in some lung tumors but not in their matched B lymphocyte-derived cell lines, consistent with frequent deletion of the entire Y chromosome in some non-small cell carcinomas30. The second interesting AAAG tandem repeat locus is located in the 5′ UTR of ERR-γ (estrogen-related receptor gamma, ESRRG, located on chromosome 1q41), which has 10 copies of the 4-mer (AAAG) motif, as found in the reference human genome sequence in the UCSC genome browser. ERR-γ is an orphan nuclear receptor and operates independently of estrogen; however, ERR-γ does bind to certain estrogen response elements to activate transcription31. Also, ERR-γ and its known co-activators have been linked to breast, ovarian and colon cancer32 and more recently to tamoxifen resistance in invasive lobular carcinoma of the breast33.
ERR-γ has 2 known isoforms, one with an alternative first exon and one with an alternative 5′ UTR. It is possible that the differential AAAG microsatellite confers alternate regulation of ERR-γ, as is thought to be the case for the gene encoding the parathyroid hormone receptor, which also harbors a polymorphic (AAAG)n repeat sequence in its promoter region that co-varies with adult height34. There are 22 candidate transcription factors (
As shown in
Based on genotyping results, the size of the AAAG motif ranged between 5 and 21 copies. We chose 13 motif copies as the cut-off length for classification as “long”, as this number was the most rare among samples (only one patient with an allele of this length), and 12 copies was relatively common and equally observed (4-6 incidences) for each class of sample (e.g., cancer and non-cancer). Based on these criteria, carriers and non-carriers of the longer allele for each category of patient are presented in Table 1.
As shown, a statistically significant higher incidence of long allele carriers (p value=0.0134, two tailed Fisher's exact test) was observed for breast cancer patients (14.3%), compared to healthy volunteers (4.8%), which translates to a relative risk ratio of 2.97 (14.3/4.8). A similar trend was observed when cancer-free volunteers were compared to patients with colon neoplasia (11.8% and 9.4% long allele carriers for persons with colorectal cancer and colon polyps, respectively), although this difference was not statistically significant (p value=0.129, two tailed Fisher's exact test). However, comparison of cancer-free volunteers with breast and colon cancer patients combined (i.e., both sets of cancer patients considered as one group) did yield statistically significant results (p value=0.0132, two-tailed Fisher's Exact test). The percentage of carriers for the 22 lung cancer cell line samples examined was similar to what was observed for cancer-free carriers (4.5%). The incidence of carriers in patients without cancer but a known family history of breast cancer (8.2%), on the other hand, was slightly higher than cancer-free volunteers but lower than breast or colon cancer patients. Our results indicate a possible hereditary trend for both breast cancer and colon cancer; however, a much larger population is needed to definitively determine the potential contribution of this locus to risk for hereditary cancers. The incidence of this potential biomarker should also be examined in other potentially heritable cancers, such as ovarian cancer, which is known to be linked to familial (especially BRCA1/2-associated) breast cancer35.
The distribution of the allele sizes for the different patient groups is shown in
Colon cells exposed to MNNG (alkylating agent) for 72 hours and specific DNA damage after treatment with alkylating agents over time (
Microsatellites are mainly understudied despite their known connection with cancer and other diseases (e.g., neurological developmental defects), because there has never been a method for assaying them en masse until now. In this study, we describe a new method for the detection and comparison of global microsatellite changes, a technique that is both sensitive and specific. There are multiple potential applications for this new array, which can detect a single contaminating microsatellite motif, present at a calculated concentration as low as 2-5 copies per cell36-38, as was demonstrated with EBV-transformed B lymphocyte DNA (
We found a set of commonly destabilized repetitive microsatellite motifs in tumors and germ lines, a pattern that may represent a cancer predisposition biomarker. Notably, whereas the pattern of microsatellite expansion was seen in the germ lines as well as the tumors in breast and colon cancer patients, the pattern was seen only in the tumor line derived from a small cell lung carcinoma patient. It is possible that this difference may be related to the relative importance of environmental factors versus genetic predisposition in the etiology of these different neoplasms. We might expect that lung cancer, because it is usually caused by tobacco exposure, would be less likely to be associated with underlying genetic risk factors.
Most of the microsatellites altered in cancer patients consist of multiples of nucleotides A and T; that is, the differential motif sequence usually takes the form of AnTm. Further research will be needed to ascertain the reason for this pattern, but the fact that particular repeat motifs are mutated more commonly suggests that there is sequence bias in the DNA repair machinery in tumors favoring errors in such motifs. It is also interesting to note that the distribution of microsatellites found to be variable between cancer-free volunteers and cancer patients strongly favors microsatellites that are located outside gene coding regions. Indeed, only one of the 42,702 loci that contain these microsatellites lies within an exon (Table 4), suggesting that there is extreme selection pressure against these particular motifs within coding regions. There are 1,124 1- to 6-mer microsatellites located in exons out of ˜507,000 computationally identified in the human reference genome, which equals ˜0.2%. So, the expected value in the set of microsatellites identified as differential should be 95, much higher than what was actually observed (i.e., only 1).
Differential motifs discovered using this array can lead to the discovery of specific disease-associated genetic loci. For example, after measuring the increased hybridization signal reflecting alterations in tandem repeats of the AAAG motif, we were able to consider which of the genes near these microsatellites might be expected to affect cancer behavior and then subject these loci to more detailed analysis. We discovered a variable repetitive motif in the 5′ UTR of ERR-γ that exhibits a significantly higher incidence in patients with breast cancer and possibly colon neoplasia. ERR-γ expression has previously been implicated as a potential prognostic marker in breast cancer33,39. ERR-γ has 2 known isoforms, one with an alternative first exon and one with an alternative 5′ UTR. It is possible that the differential AAAG microsatellite confers alternate regulation of ERR-γ, as is thought to be the case for the gene encoding the parathyroid hormone receptor, which also harbors a polymorphic (AAAG)n repeat sequence in its promoter region that co-varies with adult height34. There are 22 candidate transcription factors (see
Because microsatellites have in many cases been shown to impact expression of adjacent genes14,41, it is interesting to speculate that ERR-γ expression differences related to the different AAAG copy number may impact breast cancer risk. If the frequency of this potentially predictive marker is sustained in a larger population, and the mechanism by which it confers the cancer phenotype can be identified, it may contribute substantially as a biomarker offering surveillance, prophylactic surgery, and chemoprevention options to patients. Based on our assessment, this allele carries a 2.97 relative risk. As a comparison, deleterious germ line mutations of the BRCA1 gene have a 3-7% frequency in breast cancer patients (age <45), which is significantly elevated in those with a family history (up to 33%). Such mutations are associated with a 3-7 times higher risk of breast cancer, compared to non-mutation carriers42,43. The incidence of BRCA1 mutation in the general population is estimated at 0.2 to 0.4%44.
The potential role of microsatellites in a number of different neoplasms as demonstrated in this work is significantly greater than might be predicted given the individual locus discoveries to date. Whereas microsatellite instability has been sporadically demonstrated in a large number of tumors, consistent MSI has been seen most commonly in colorectal carcinoma and endometrial carcinoma. It should be noted that the standard assay for MSI compares microsatellite length for an extremely limited set of loci between tumor DNA and non-tumor DNA from the same patient. Because we have found alterations in microsatellite differences that affect germ line DNA, they would not be detected by the standard MSI assay. Indeed, what we have described (in the case of breast, cancer and hepatoblastoma tumors) would not be regarded as MSI, since the microsatellite patterns do not differ in the tumor from the normal tissue. However, we have found that assaying more widely for alterations in microsatellite content reveals abnormalities in other tumor types as well. Based on our results, global microsatellite content may be used to distinguish individuals at higher risk of developing cancer and may be a better gauge of “MSI”.
It is provocative to consider the similarities and differences between the microsatellite patterns observed in DNA derived from tumor tissue when compared to the DNA obtained from normal tissue. Primary breast cancer tumors exhibit significantly increased hybridization of some microsatellite motifs, a pattern also seen in non-tumor DNA from these patients, when compared to the DNA obtained from a set of cancer-free individuals. A similar concurrence of microsatellites is seen in the embryonal tumor hepatoblastoma. That these altered microsatellite patterns are found in DNA from both tumor and germ line DNA suggests that such alterations may predispose to the development of cancer. This pattern contrasts with the pattern seen in lung cancer; whereas the tumor exhibits an altered microsatellite pattern, the germ line is not different from cancer-free subjects. Thus, in lung cancer patients, the carcinogenic insult may induce the development of microsatellite alterations that contribute to neoplastic transformation. These results further suggest that these microsatellite motifs in particular are a clue to the underlying mechanism responsible, which may be a target to intercept the oncogenesis process. Interestingly, we found microsatellite alterations in colon cancer tumors, in which there was variable presence of this genotype in the germ line. Perhaps colon cancer resides in the middle of the scale measuring the relative importance of the underlying genetic milieu versus the importance of environmental factors in the development of malignancy, which is consistent with the highly variable exposure of the colon to different foods.
A larger scale study may be merited to determine if global microsatellite content signatures can also be used as a reliable biomarker for tumor sub-type classification and prediction of prognosis or response to therapy. The abnormal microsatellite signatures potentially implicate thousands of genetic loci. Investigation of a very small subset led to significant findings. This suggests that there may be many more important repeat-containing loci affecting cancer development or progression that are yet to be identified.
Hepatitis C virus: 6 of 12 genomes downloaded contained a 20 bp “T” repeat. Human T-lymphotropic virus: No 18 to 20 bp microsats found. 6 out of 16 genomes downloaded contained a 12 bp CCAGAG microsat. Human herpes virus 8: 2 out of 3 genomes contained a 20 bp “G” repeat. All 3 had a CCTGCT repeat. Lengths were (2) 23 bps and (1) 17 bps.
HCT15
113/119
127/127
146/146
HCT116
92/92
102/102
120/126
142/142
RKO
86/89
101/101
112/112
121/124
136/136
F
42
Caucasian
No cancer
No
10
17
F
51
Caucasian
N/K
No cancer
No
9
17
F
41
N/K
N/K
Endometrial cyst
No
9
15
F
49
Hispanic
N/K
No cancer
No
9
19
F
60
Caucasian
N/K
No cancer
No
7
17
N01-01-021
No cancer
10
16
F
37
African
Neg
No cancer
Maternal aunt, mother,
11
17
American
maternal grandmother,
maternal cousin with
breast cancer
F
36
Caucasian
BRCA2−
No cancer
Mother and maternal
9
17
aunt with breast
cancer
F
70
Caucasian
Neg
No cancer
Mother and two niece
9
15
with breast cancer
F
50
Caucasian
BRCA1−
No cancer
Paternal great aunt
7
18
with breast cancer
F
41
Caucasian
Neg
Breast Cancer
N/K
10
19
F
48
African-
Neg
Breast Cancer
N/K
7
19
American
F
51
Black
Neg
Breast Cancer
N/K
10
17
F
47
Caucasian
BRCA1+
Breast Cancer
N/K
7
16
F
34
Caucasian
BRCA2+
Breast Cancer
N/K
7
17
F
41
Caucasian
Neg
Breast Cancer
Family history of
9
19
breast cancer
F
42
Caucasian
Neg
Breast Cancer
None
10
17
F
49
Caucasian
Neg
Breast Cancer
Maternal aunt and
11
18
mother with breast
cancer
F
53
Caucasian
N/K
Metastatic
no family history of
10
18
breast cancer
cancer
F
52
African-
N/K
Metastatic
mother with throat
7
17
American
breast cancer
cancer, aunt with
pancreatic cancer,
aunt with N/K cancer
F
56
Caucasian
N/K
Metastatic
mother with breast
7
17
breast cancer
cancer
F
46
Caucasian
Neg
Bilateral breast
sister with breast
12
16
cancer
cancer, paternal uncle
with mesothelioma,
paternal grandfather
with lung cancer
F
71
Caucasian
Neg
Breast Cancer
daughter with breast
7
16
cancer and Paget's,
father with colon
cancer, paternal uncle
with thyroid cancer,
paternal cousin with
breast cancer;
paternal grandmother
with leukemia, mother
with colon and
pancreatic cancer,
maternal uncle with
melanoma, maternal
aunt N/K cancer,
maternal aunt with
breast cancer,
maternal cousin with
breast cancer;
maternal grandmother
with breast cancer,
maternal grandfather
with N/K cancer
F
46
Caucasian
N/K
Breast Cancer
mother with bone
9
21
cancer
F
83
Caucasian
N/K
sister and maternal
9
16
carcinoma
aunt with breast
cancer
F
32
Caucasian
BRCA2+
Breast Cancer
paternal grandmother
11
17
with lung cancer
F
54
Caucasian
Neg
Breast Cancer
N/K
10
21
F
43
Caucasian
Neg
Breast cancer
N/K
Basal Breast
9
17
Cancer
Basal Breast
10
15
Cancer
F
Lum Breast
9
16
Cancer
F
43
African-
N/K
Metastatic colon
Mother with breast and
11
14
American
cancer
rectal cancer
F
78
Caucasian
N/K
Infiltrating
None
9
16
colonic
adenocarcinoma
M
61
African-
N/K
Invasive
None
7
16
American
adenocarcinoma
M
64
Hispanic
N/K
Invasive colonic
None
7
13
adenocarcinoma
M
58
African-
N/K
Invasive colonic
None
7
19
American
adenocarcinoma
M
N/K
N/K
N/K
Colon cancer
N/K
7
19
F
58
Caucasian
N/K
Colon polyps
no known family
7
15
history of cancer
F
69
Caucasian
N/K
Colon polyps
no family history of
10
16
cancer
F
60
African
N/K
Colon polyps
no family history of
7
15
American
cancer
F
61
African
N/K
Colon polyps
no known family
7
14
American
history of cancer
F
58
Hispanic
N/K
Colon polyps
unspecified relative
9
14
with colon cancer,
unspecified relative
with breast cancer
M
65
Caucasian
N/K
Large cell
N/K
7
15
carcinoma
Mus musculus
Pan troglodytes
Pan troglodytes
Gorilla gorilla
Gorilla gorilla
Pongo pygmaeus
Pongo pygmaeus
It is contemplated that any embodiment discussed in this specification can be implemented with respect to any method, kit, reagent, or composition of the invention, and vice versa. Furthermore, compositions of the invention can be used to achieve methods of the invention.
It will be understood that particular embodiments described herein are shown by way of illustration and not as limitations of the invention. The principal features of this invention can be employed in various embodiments without departing from the scope of the invention. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, numerous equivalents to the specific procedures described herein. Such equivalents are considered to be within the scope of this invention and are covered by the claims.
All publications and patent applications mentioned in the specification are indicative of the level of skill of those skilled in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
The use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.” Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the device, the method being employed to determine the value, or the variation that exists among the study subjects.
As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
The term “or combinations thereof” as used herein refers to all permutations and combinations of the listed items preceding the term. For example, “A, B, C, or combinations thereof” is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, CBA, BCA, ACB, BAC, or CAB. Continuing with this example, expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, MB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth. The skilled artisan will understand that typically there is no limit on the number of items or terms in any combination, unless otherwise apparent from the context.
As used herein, words of approximation such as, without limitation, “about”, “substantial” or “substantially” refers to a condition that when so modified is understood to not necessarily be absolute or perfect but would be considered close enough to those of ordinary skill in the art to warrant designating the condition as being present. The extent to which the description may vary will depend on how great a change can be instituted and still have one of ordinary skilled in the art recognize the modified feature as still having the required characteristics and capabilities of the unmodified feature. In general, but subject to the preceding discussion, a numerical value herein that is modified by a word of approximation such as “about” may vary from the stated value by at least ±1, 2, 3, 4, 5, 6, 7, 10, 12 or 15%.
All of the compositions and/or methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
This application claims priority to U.S. Provisional Application Ser. No. 61/186,745, filed Jun. 12, 2009, the entire contents of which are incorporated herein by reference.
This invention was made with U.S. Government support under Contract No. 5-T32-HL07360-28 and P50CA70907 from awarded by the NIH. The government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
61186745 | Jun 2009 | US |