The present invention relates to genomic DNA sequences that exhibit altered CpG methylation patterns in disease states relative to normal. Particular embodiments provide methods, nucleic acids, nucleic acid arrays and kits useful for detecting, or for detecting and differentiating between or among breast cell proliferative disorders.
The etiology of pathogenic states is known to involve modified methylation patterns of individual genes or of the genome. 5-methylcytosine, in the context of CpG dinucleotide sequences, is the most frequent covalently modified base in the DNA of eukaryotic cells, and plays a role in the regulation of transcription, genetic imprinting, and tumorigenesis. The identification and quantification of 5-methylcytosine sites in a specific specimen, or between or among a plurality of specimens, is thus of considerable interest, not only in research, but particularly for the molecular diagnoses of various diseases.
Correlation of aberrant DNA methylation with cancer. Aberrant DNA methylation within CpG ‘islands’ is characterized by hyper- or hypomethylation of CpG dinucleotide sequences leading to abrogation or over-expression of a broad spectrum of genes, and is among the earliest and most common alterations found in, and correlated with human malignancies. Additionally, abnormal methylation has been shown to occur in CpG-rich regulatory elements in intronic and coding parts of genes for certain tumors. In colon cancer, for example, aberrant DNA methylation constitutes one of the most prominent alterations and inactivates many tumor suppressor genes such as p14ARF, p16INK4a, THBS1, MINT2, and MINT31 and DNA mismatch repair genes such as hMLH1.
In contrast to the specific hypermethylation of tumor suppressor genes, an overall hypomethylation of DNA an be observed in tumor cells. This decrease in global methylation can be detected early, far before the development of frank tumor formation. A correlation between hypomethylation and increased gene expression has been determined for many oncogenes.
Breast cancer. In American women, breast cancer is the most frequently diagnosed cancer. 1 out of 8 women will develop breast cancer during her life time. Breast cancer is the second leading cause of cancer death and in women aged 40-55, breast cancer is the leading cause of death. For 2004, 215990 new cases of breast cancer and 40110 deaths are estimated in the US alone with comparable numbers in Europe.
The vast majority of breast neoplasms are of epithelial origin with ductal carcinomas representing 80% of all tumors. In addition, there are several breast cancer subtypes that are less common. Medullary and lobular carcinomas are both found in about 5% of patients diagnosed with breast cancer. Other less frequent tumor histologies include pure tubular carcinoma, mucinous or colloid carcinoma, papillary carcinoma, and Paget's disease. These cancers have substantially better prognoses, especially when found in a node-negative stage. Carcinocarcinomas and adenocystic tumors occur only sporadically.
Ductal carcinoma in-situ (DCIS) is a noninvasive, precancerous condition. DCIS can progress to become invasive cancer, but estimates of the likelihood of this vary widely. Some people include DCIS in breast cancer statistics. The frequency of the diagnosis of DCIS has increased markedly in the United States since the widespread use of screening mammography. In 1998, DCIS accounted for about 18% of all newly-diagnosed invasive plus noninvasive breast tumors in the United States. Very few cases of DCIS present as a palpable mass; 80% are diagnosed by mammography alone.
The most common benign breast conditions include fibrocystic breast condition, benign breast tumors, and breast inflammation. Fibrocystic disease is the most frequent benign breast condition affecting every second woman at least once during her life time. Symptoms of fibrocystic breasts in the breast include cysts (accumulated packets of fluid), fibrosis (formation of scar-like connective tissue), lumpiness, areas of thickening, tenderness, or breast pain. Fibrocystic breasts can sometimes make breast cancer more difficult to detect with mammography.
Fibroadenomas are common benign breast tumors often too small to feel by hand, though occasionally, they may grow to be several inches in diameter. Fibroadenomas are made up of both glandular and stromal (connective) breast tissue and usually occur in women between 20-30 years of age. According to the American Cancer Society, African-American women are affected with fibroadenomas more often than women of other racial or ethnic groups. Phyllodes tumors are also benign breast tumors in the glandular and stroma breast tissues but are far less common than fibroadenomas. The difference between phyllodes tumors and fibroadenomas is that there is an overgrowth of the fibro-connective tissue in phyllodes tumors. Phyllodes tumors are usually benign but on very rare occasions, they may be malignant (cancerous) and could metastasize. Granular cell tumors are usually found in the mouth or skin but may rarely be detected in the breast as well. However, granular cell tumors do not indicate higher risk for developing breast cancer. Mastitis is an inflammation of breast tissue that commonly occurs during breast feeding.
The TNM classification was devised by the International Union Against Cancer (UICC) and accepted by the American Joint Commission on Cancer Staging. TNM is based on the clinical features of tumor (T), the regional lymph nodes (N), and the presence or absence of distant metastases (M). The tumor is characterized by its size, so that a T1 is a tumor under 2 cm, a T2 is 2 to 5 cm, and a T3 is over 5 cm. Breast carcinomas in-situ are labeled Tis and distinguished in ductal carcinoma in-situ (DCIS), lobular carcinoma in-situ (LCIS) and Paget's disease. Similarly, N0 represents negative or normal regional lymph nodes and MO absence of distant metastases, respectively. Involvement of the ipsilateral axillary lymph nodes is the most reliable and reproducible prognostic indicator for primary breast cancer. In general, 50 to 70% of patients with positive lymph nodes have a relapse, whereas only 20 to 35% of patients with all lymph nodes negative for metastatic disease have a relapse after loco-regional treatments only.
Tumor grade reflects the differentiation status of the tumor. Breast cancer differentiation is usually described as well (grade 1), moderately (grade 2) or poorly (grade 3) differentiated, respectively. Poorly differentiated tumors tend to be more aggressive. Eighty-six percent of patients having tumors of good nuclear grade survived for 8 years as opposed to 64% in whom the nuclear grade was scored as poor.
Hormone receptor levels are important parameters to classify breast cancer both with respect to prognosis as well as for treatment planning. Patients with ER-positive tumor tend to have a more indolent course and to metastasize preferentially to soft tissue and bone; conversely, those with ER-negative tumors relapse earlier, and metastases to liver, lung, and central nervous system are more likely. ER-positive tumors are more often well differentiated and are associated with other favorable prognostic characteristics. Although patients with ER-positive tumors tend to have better short-term disease-free and overall survival rates than do patients with ER-negative tumors, the differences between the two groups tend to diminish or even disappear with time. PsR appeared in some studies to be a more valuable prognostic indicator than the ER. In addition, high levels of estrogen receptor expression are predictors of a favorable response to endocrine therapy.
A large study with long follow-up indicated that women 45 to 49 years of age had the best prognosis and that the very young (those under age 35 years) or elderly patients had the worst breast cancer survival. However, when other, more important tumor characteristics are considered, age and menopausal status are not important prognostic indicators.
Breast cancer is often diagnosed as a palpable mass by self examination of the patient or during a clinical breast examination by a physician. The increasing use of mammography in breast cancer screening has resulted in many cancers being found already in a stage where no palpable mass is detectable. Suspicious masses are usually followed up by further imaging such as ultra sound or MRI and definitive diagnosis is confirmed by needle biopsy.
The American Cancer Society currently recommends the following guidelines for the detection of breast cancer in women who are asymptomatic:
In 2001, new analyses were conducted for clinical breast exam and breast self-exam. There is no direct, but some indirect, evidence that CBE decreases breast cancer mortality. CBE has an overall sensitivity of 54%, which varies with the patient's age and size of the mass, as well as the provider's skill in clinical examination. In the absence of direct evidence, the recommendation for CBE was based on consensus opinion. Regarding breast self exam, a meta-analysis of 6 large controlled trials found no reduction in the relative risk of mortality in women performing BSE vs. those not performing BSE (0.94, 95% C.I. 0.83-1.06). Recently it was further questioned whether routine BSE reduces breast cancer mortality and might even harm. Despite this evidence breast self examination (BSE) and clinical breast examination (CBE) are both still part of the current guidelines for breast cancer screening.
Mammography is currently the only exam approved by the U.S. Food and Drug Administration (FDA) to screen for breast cancer in asymptomatic women. Randomly controlled trials conducted in the US and several European countries have demonstrated that routine mammography screening can reduce breast cancer mortality by 2040% if performed annually. On average the sensitivity of mammography is up to 85%. However, mammography is less sensitive in patients with dense breast or benign breast conditions such as fibrocystic changes or lumps. Five to 10 percent of screening mammogram results are abnormal and require more testing. False positive rates are higher in younger women due to higher density of their breasts.
In the US, mammography screening has been established for the general population. Although mammography involves low dose radiation and discomfort to the tested person, the compliance rate is around 70%. The average charge for screening mammography is $141, $101 technical, $40 for interpretation. Despite the obvious success of mammography, growing difficulty has been reported to recruit trained staff, both doctors and nurses, for mammography clinics resulting in a 8% decline of sites over the last 4 years. It was speculated that this might be due to long hours, low reimbursement, heavy regulation and fear of lawsuits.
MRI imaging is mainly used to follow up positive mammography results. MRI is very sensitive, is well suited to analyze dense breasts and younger women, but is slower, more expensive, and is more difficult to guide breast biopsies. Currently, trials are ongoing to assess MRI for screening in high risk populations such as carriers of breast cancer susceptibility gene mutations
Due to current screening programs and the accessibility to self-examination, breast cancer is diagnosed comparatively early: in about 65% of all newly diagnosed cases, the cancer is limited to the breast and has not yet spread to the lymph nodes. Therefore, for most patients diagnosed with organ confined breast cancer the suggested primary therapeutic intervention is surgery often followed by radiation therapy.
Although the tumor can be completely removed by surgery in most early stage breast cancer patients. However, without further therapy, about one third of these will develop metastases during follow-up. It is thought that this is due to occult metastases (or micrometastases) already present at the time of surgery. Based on this observation, systemic adjuvant treatment has been introduced also for node-negative breast cancers. Systemic adjuvant therapy is administered after surgical removal of the tumor, and has been shown to reduce the risk of recurrence significantly (Early Breast Cancer Trialists'Collaborative Group, 1998). Several types of adjuvant treatment are available: endocrine treatment (for hormone receptor positive tumors), different chemotherapy regimens, and novel agents like Herceptin. The treatment regimen for the individual patient is chosen according to guidelines such as St Gallen or NIH which are based on the pathological classification of the tumor (mainly TNM, grade, ER status).
Based on figures from the American Cancer Society, the prognosis of breast cancer is clearly correlated with cancer stage. The earlier a tumor is detected the better the prognosis for survival. Therefore, breast cancer screening tests that detect cancer at an early result in a reduction of breast cancer mortality.
Epigenetic regulation such as DNA methylation has been established as a frequent and early event in carcinogenesis In particular for breast cancer, the contribution of DNA methylation has been well documented. Widschwendter and Jones (Oncogene. 2002 Aug. 12; 21(35):5462-82.) demonstrated in a recent review that DNA methylation changes have been reported in the context of all relevant steps in breast carcinogenesis including 1) evasion of apoptosis, 2) insensitivity to antigrowth signals, 3) self-sufficiency in growth signals, 4) limitless replicative potential, 5) tissue invasion and metastasis and 6) sustained angiogenesis. Less well documented is the role of DNA methylation in early pre cancerous lesions such as DCIS and in less frequent cancerous lesions such as lobular carcinoma. Fackler et al. showed that 95% of DCIS lesions (N=44) showed hypermethylation of at least one out or 5 genes including RASSF1A, HIN-1, RAR-beta, Cyclin D2 and Twist. The same authors compared frequency of hypermethylation of these genes between ductal and lobular carcinomas and found similar methylation patterns except for TWIST which was found less hypermethylated in lobular carcinomas. Similarly, Lehmann et al. reported very early stage changes in methylation of both RASSFIA and stratifin and found more frequent inactivation of DAPK in lobular carcinomas compared with ductal carcinomas. Early stage methylation changes in DCIS for RASSF1A and stratifin were confirmed by other groups.
In order to screen an asymptomatic population, candidate molecular markers have to be detectable in remote samples in order to allow for non invasive screening assays. It is well established that DNA from breast tumors can be detected as free nucleic acid in serum and plasma. Alternatively fluids obtained directly from the breast such as nipple aspirate fluid (NAF) or ductal lavage have also been suggested as a body fluid source for molecular testing. However, since the procedures to obtain NAF and ductal lavage have been questioned for their reliability and are uncomfortable for the patient, blood based testing is clearly preferred for screening assays.
Blood based molecular cancer screening assays are faced with the challenge to detect minute amounts of tumor DNA in a background of normal DNA from other tissues. Tumor markers that are based on DNA methylation can be detected using assays that amplify DNA in a methylation specific way such as MSP or HeavyMethyl™. These assays allow highly sensitive detection of few copies of methylated DNA in a background of excess normal DNA
Multifactorial approach. Cancer diagnostics has traditionally relied upon the detection of single molecular markers (e.g. gene mutations, elevated PSA levels). Unfortunately, cancer is a disease state in which single markers have typically failed to detect or differentiate many forms of the disease. Thus, assays that recognize only a single marker have been shown to be of limited predictive value, as well be discussed briefly herein. A successful approach currently being pursued in methylation based cancer diagnostics and the screening, diagnosis, and therapeutic monitoring of such diseases is the use of a selection of multiple markers. The multiplexed analytical approach is particularly well suited for cancer diagnostics since cancer is not a simple disease, this multi-factorial “panel” approach is consistent with the heterogeneous nature of cancer, both cytologically and clinically.
Key to the successful implementation of a panel approach to methylation based diagnostic tests is the design and development of optimized panels of markers that can characterize and distinguish disease states. This patent application describes an efficient and unique panel of genes the methylation analysis of one or a combination of the members of the panel enabling the detection of cell proliferative disorders of the prostate with a particularly high sensitivity, specificity and/or predictive value.
Development of medical tests. Two key evaluative measures of any medical screening or diagnostic test are its sensitivity and specificity, which measure how well the test performs to accurately detect all affected individuals without exception, and without falsely including individuals who do not have the target disease (predictive value). Historically, many diagnostic tests have been criticized due to poor sensitivity and specificity.
A true positive (TP) result is where the test is positive and the condition is present. A false positive (FP) result is where the test is positive but the condition is not present. A true negative (TN) result is where the test is negative and the condition is not present. A false negative (FN) result is where the test is negative but the condition is not present.
Sensitivity=TP/(TP+FN)
Specificity=TN/(FP+TN)
Predictive value=TP/(TP+FP)
Sensitivity is a measure of a test's ability to correctly detect the target disease in an individual being tested. A test having poor sensitivity produces a high rate of false negatives, i.e., individuals who have the disease but are falsely identified as being free of that particular disease. The potential danger of a false negative is that the diseased individual will remain undiagnosed and untreated for some period of time, during which the disease may progress to a later stage wherein treatments, if any, may be less effective. An example of a test that has low sensitivity is a protein-based blood test for HIV. This type of test exhibits poor sensitivity because it fails to detect the presence of the virus until the disease is well established and the virus has invaded the bloodstream in substantial numbers. In contrast, an example of a test that has high sensitivity is viral-load detection using the polymerase chain reaction (PCR). High sensitivity is achieved because this type of test can detect very small quantities of the virus. High sensitivity is particularly important when the consequences of missing a diagnosis are high.
Specificity, on the other hand, is a measure of a test's ability to identify accurately patients who are free of the disease state. A test having poor specificity produces a high rate of false positives, i.e., individuals who are falsely identified as having the disease. A drawback of false positives is that they force patients to undergo unnecessary medical procedures treatments with their attendant risks, emotional and financial stresses, and which could have adverse effects on the patient's health. A feature of diseases which makes it difficult to develop diagnostic tests with high specificity is that disease mechanisms, particularly in cancer, often involve a plurality of genes and proteins. Additionally, certain proteins may be elevated for reasons unrelated to a disease state. An example of a test that has high specificity is a gene-based test that can detect a p53 mutation. Specificity is important when the cost or risk associated with further diagnostic procedures or further medical intervention are very high.
Methylation analysis of breast fluids. The detectability of methylation in body fluids of breast cancer patients has been established. Silva et al. (Br J Cancer. 1999 June; 80(8):1262-4.) described the detection of methylated p16INK4a exon 1 in the plasma of five of eight tested breast cancer patients with p16INK4a exon 1 tumor methylation. The sensitivity of the analysis has since been improved by analyzing a panel of genes. Krassenstein et al. (Clin Cancer Res. 2004 Jan. 1; 10(1 Pt 1):28-32.) described the analysis of a panel of six genes (GSTP1, RARB2, p16INK4a, p14ARF, RASSF1A and DAPK) in matched tumor and nipple aspirate fluid (herein also referred to as NAF). Using a small sample set (22 tumors, 5 healthy patients and 5 benign breast patients) it was established that at least one gene of the panel was methylated in the tumor, and that this methylation could be detected in the NAF of 18 of the 22 breast cancer.
Although the study by Krassenstein et al. confirms that methylation observed in tumour tissue can be detected in breast derived fluid there are several technical problems associated with providing a body fluid based test, some of which were acknowledged but not satisfactorily resolved. Firstly, the markers used by Krassenstein were not specifically methylated in breast cancer, but were general cancer markers methylated in a range of cancers. Therefore, if said panel were to be utilized in a clinical setting it is probable that there would be an increased number of false positives, where methylated DNA from cancers of other tissues (or free-floating DNA therefrom) would be detected.
Although Krassenstein et al. aimed to provide a breast cancer screening test, patient compliance with a procedure such as NAF (or e.g. ductal lavage) would likely be low, except in the most at-risk populations. The preferred sample type for a screening test with high patient compliance would more preferably be blood based. The performance of a panel tested on tissue or NAF is no indicator as to its performance on a blood sample. Therefore, any panel would have to be validated on blood samples in order to establish that the markers were not also methylated in blood.
These issues were both shortly addressed by Evron et al. (Lancet. 2001 Apr. 28; 357(9265):1335-6.). In this study a gene panel consisting, of Cyclin D2, RARB and Twist was selected based on factors including their methylation status in white blood cells. The methylation status of these genes was then analyzed in matched tissue and ductal lavage fluid. The gene panel detected invasive breast cancer in 48 of 50 samples.
The main aim of a screening test is the detection of low grade tumors, such as DCIS, higher grade tumors are easily detected by e.g. self-examination and mammography and are harder to treat. This was not enabled by Krassenstein et al. The sample set used consisted of mostly high grade (2 and 3) tumors. There was only one grade 1 tumor. Therefore it may be concluded the suitability of the panel for the detection of low grade tumors has not been fully established, in particular as it is not possible to confirm if the NAF sample in question tested positive for methylation. Furthermore, it is generally assumed that methylation is a progressive feature of tumorigenesis. One would conclude that it is in the detection of the early stages of tumorigenesis that a highly specifically selected panel of genetic markers as opposed to a panel of markers selected for their methylation status in high grade tumor tissue would be required. The panel investigated by Evron et al. detected only 8 out of 14 cases of DCIS thus confirming that a panel suited to the detection of breast cancers is not necessarily suitable for detection of DCIS.
One can summarize the state of the art in that methylation gene panels for the analysis of breast fluids are known, however, there are deficiencies associated with all said panels. Due to these technical difficulties none of these gene panels fulfill the requirements of a breast cancer screening test, namely that the test is suitable for use in body fluids, most preferably blood and that the test be capable of detecting early stage cell proliferative disorders such as DCIS.
The use of a panel of methylation markers consisting RASSF1A, TWIST, Cyclin D2 and HIN1 for the differentiation of invasive breast carcinoma from normal breast tissue was recently described by Fackler et al. By use of a highly sensitive quantitative multiplexed methylation-specific PCR assay the study detected the presence of promoter hypermethylation of the gene panel in breast cancer (as opposed to normal breast tissue) with a sensitivity of 84% and a specificity of 94%. Therefore, the study is significant in establishing the use of methylation markers for the sensitive detection of breast cancer cells in a background of non-cancerous cells. However, the samples used by Fackler et al., were tissue based (including those extracted by means of ductal lavage), thus it has not been established whether such a sensitivity and specificity would be obtainable using body fluid based samples. Furthermore, the same panel proved unable of detecting ductal carcinoma in-situ. Therefore, the method is unsuitable for use as an early screening test.
The present invention provides novel methods for detecting and/or distinguishing between breast cell proliferative disorders. In preferred embodiments the present invention enables the screening of at-risk populations for the early detection of breast cancers. Further embodiments of the method may also be used as alternatives to cytological screening for the differentiation of breast carcinomas from benign breast cell proliferative disorders.
The invention achieves solves this longstanding need in the art by providing a panel of genes and/or genomic sequences according to Table 3, the expression of these genes are indicative of the presence or absence of breast cell proliferative disorders or features thereof. Preferred selections and combinations of genes are provided, the analysis of which enable the differentiation and detection of various classes of breast cell proliferative disorders, namely:
It is particularly preferred that the expression status of said genes and/or genomic sequences is determined according to the methylation status of CpG positions thereof.
In order to enable this analysis the invention provides a method for the analysis of biological samples for genomic methylation associated with the development of breast cell proliferative disorders. Said method is characterized in that at least one nucleic acid, or a fragment thereof, from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118 is/are contacted with a reagent or series of reagents capable of distinguishing between methylated and non methylated CpG dinucleotides within the genomic sequence, or sequences of interest.
The present invention provides further methods for ascertaining genetic and/or epigenetic parameters of SEQ ID NO: 1 to SEQ ID NO: 118, including but not limited to mRNA expression analysis, protein expression analysis and methylation analysis. The method has utility for the improved diagnosis, differentiation and treatment of breast cell proliferative disorders, more specifically by enabling the improved detection of and differentiation between subclasses of said disorder. The invention presents several improvements over the state of the art. Although methylation assays for the detection of breast cancer in body fluids are known there is currently no assay that fulfills the criteria of being validated in blood and capable of detecting DCIS with a suitable accuracy for commercial approval.
The source may be any suitable source, such as cell lines, histological slides, biopsies, paraffin-embedded tissue, body fluids, blood plasma, blood serum, whole blood, isolated blood cells, cells isolated from the blood and all possible combinations thereof. It is preferred that said sources of DNA are body fluids selected from the group consisting nipple aspirate fluid, lymphatic fluid, ductal lavage fluid, fine needle aspirate, blood plasma, blood serum, whole blood, isolated blood cells, cells isolated from the blood.
Specifically, the present invention provides a method for detecting breast cell proliferative disorders, comprising: obtaining a biological sample comprising genomic nucleic acid(s); contacting the nucleic acid(s), or a fragment thereof, with one reagent or a plurality of reagents sufficient for distinguishing between methylated and non methylated CpG dinucleotide sequences within a target sequence of the subject nucleic acid, wherein the target sequence comprises, or hybridizes under stringent conditions to, a sequence comprising at least 16 contiguous nucleotides of SEQ ID NO: 1 to SEQ ID NO: 118, said contiguous nucleotides comprising at least one CpG dinucleotide sequence; and determining, based at least in part on said distinguishing, the methylation state of at least one target CpG dinucleotide sequence, or an average, or a value reflecting an average methylation state of a plurality of target CpG dinucleotide sequences. Preferably, distinguishing between methylated and non methylated CpG dinucleotide sequences within the target sequence comprises methylation state-dependent conversion or non-conversion of at least one such CpG dinucleotide sequence to the corresponding converted or non-converted dinucleotide sequence within a sequence selected from the group consisting of SEQ ID NO: 493 to SEQ ID NO: 964, and contiguous regions thereof corresponding to the target sequence.
Additional embodiments provide a method for the detection of breast cell proliferative disorders comprising: obtaining a biological sample having subject genomic DNA; extracting the genomic DNA; treating the genomic DNA, or a fragment thereof, with one or more reagents to convert 5-position unmethylated cytosine bases to uracil or to another base that is detectably dissimilar to cytosine in terms of hybridization properties; contacting the treated genomic DNA, or the treated fragment thereof, with an amplification enzyme and at least two primers comprising, in each case a contiguous sequence at least 9 nucleotides in length that is complementary to, or hybridizes under moderately stringent or stringent conditions to a sequence selected from the group consisting SEQ ID NO: 493 to SEQ ID NO: 964, and complements thereof, wherein the treated DNA or the fragment thereof is either amplified to produce an amplificate, or is not amplified; and determining, based on a presence or absence of, or on a property of said amplificate, the methylation state of at least one CpG dinucleotide sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118, or an average, or a value reflecting an average methylation state of a plurality of CpG dinucleotide sequences thereof. Preferably, at least one such hybridizing nucleic acid molecule or peptide nucleic acid molecule is bound to a solid phase. Preferably, determining comprises use of at least one method selected from the group consisting of: hybridizing at least one nucleic acid molecule comprising a contiguous sequence at least 9 nucleotides in length that is complementary to, or hybridizes under moderately stringent or stringent conditions to a sequence selected from the group consisting of SEQ ID NO: 493 to SEQ ID NO: 964, and complements thereof; hybridizing at least one nucleic acid molecule, bound to a solid phase, comprising a contiguous sequence at least 9 nucleotides in length that is complementary to, or hybridizes under moderately stringent or stringent conditions to a sequence selected from the group consisting of SEQ ID NO: 493 to SEQ ID NO: 964, and complements thereof; hybridizing at least one nucleic acid molecule comprising a contiguous sequence at least 9 nucleotides in length that is complementary to, or hybridizes under moderately stringent or stringent conditions to a sequence selected from the group consisting of SEQ ID NO: 493 to SEQ ID NO: 964, and complements thereof, and extending at least one such hybridized nucleic acid molecule by at least one nucleotide base; and sequencing of the amplificate.
Further embodiments provide a method for the analysis of breast cell proliferative disorders, comprising: obtaining a biological sample having subject genomic DNA; extracting the genomic DNA; contacting the genomic DNA, or a fragment thereof, comprising one or more sequences selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118 or a sequence that hybridizes under stringent conditions thereto, with one or more methylation-sensitive restriction enzymes, wherein the genomic DNA is either digested thereby to produce digestion fragments, or is not digested thereby; and determining, based on a presence or absence of, or on property of at least one such fragment, the methylation state of at least one CpG dinucleotide sequence of one or more sequences selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118, or an average, or a value reflecting an average methylation state of a plurality of CpG dinucleotide sequences thereof. Preferably, the digested or undigested genomic DNA is amplified prior to said determining.
Additional embodiments provide novel genomic and chemically modified nucleic acid sequences, as well as oligonucleotides and/or PNA-oligomers for analysis of cytosine methylation patterns within sequences from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118.
In a further embodiment the present invention provides DNA markers associated with the presence of general cancers.
The term “Observed/Expected Ratio” (“O/E Ratio”) refers to the frequency of CpG dinucleotides within a particular DNA sequence, and corresponds to the [number of CpG sites/(number of C bases×number of G bases)]×band length for each fragment.
The term “CpG island” refers to a contiguous region of genomic DNA that satisfies the criteria of (1) having a frequency of CpG dinucleotides corresponding to an “Observed/Expected Ratio”>0.6, and (2) having a “GC Content”>0.5. CpG islands are typically, but not always, between about 0.2 to about 1 kb, or to about 2 kb in length.
The term “methylation state” or “methylation status” refers to the presence or absence of 5-methylcytosine (“5-mCyt”) at one or a plurality of CpG dinucleotides within a DNA sequence. Methylation states at one or more particular CpG methylation sites (each having two CpG CpG dinucleotide sequences) within a DNA sequence include “unmethylated,” “fully-methylated” and “hemi-methylated.”
The term “hemi-methylation” or “hemimethylation” refers to the methylation state of a double stranded nucleic acid, wherein only the CpG positions of one strand thereof is methylated (e.g., 5′-CCMGG-3′ (top strand): 3′-GGCC-5′ (bottom strand)).
The term ‘AUC’ as used herein is an abbreviation for the area under a curve. In particular it refers to the area under a Receiver Operating Characteristic (ROC) curve. The ROC curve is a plot of the true positive rate against the false positive rate for the different possible cutpoints of a diagnostic test. It shows the trade-off between sensitivity and specificity depending on the selected cutpoint (any increase in sensitivity will be accompanied by a decrease in specificity). The area under an ROC curve (AUC) is a measure for the accuracy of a diagnostic test (the larger the area the better, optimum is 1, a random test would have a ROC curve lying on the diagonal with an area of 0.5; for reference: J. P. Egan. Signal Detection Theory and ROC Analysis, Academic Press, New York, 1975).
The term “hypermethylation” refers to the average methylation state corresponding to an increased presence of 5-mCyt at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mCyt found at corresponding CpG dinucleotides within a normal control DNA sample.
The term “hypomethylation” refers to the average methylation state corresponding to a decreased presence of 5-mCyt at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mCyt found at corresponding CpG dinucleotides within a normal control DNA sample.
The term “microarray” refers broadly to both “DNA microarrays,” and ‘DNA chip(s),’ as recognized in the art, encompasses all art-recognized solid supports, and encompasses all methods for affixing nucleic acid molecules thereto or synthesis of nucleic acids thereon.
“Genetic parameters” are mutations and polymorphisms of genes and sequences further required for their regulation. To be designated as mutations are, in particular, insertions, deletions, point mutations, inversions and polymorphisms and, particularly preferred, SNPs (single nucleotide polymorphisms).
“Epigenetic parameters” are, in particular, cytosine methylations. Further epigenetic parameters include, for example, the acetylation of histones which, however, cannot be directly analyzed using the described method but which, in turn, correlate with the DNA methylation.
The term “bisulfite reagent” refers to a reagent comprising bisulfite, disulfite, hydrogen sulfite or combinations thereof, useful as disclosed herein to distinguish between methylated and unmethylated CpG dinucleotide sequences.
The term “Methylation assay” refers to any assay for determining the methylation state of one or more CpG dinucleotide sequences within a sequence of DNA.
The term “MS.AP-PCR” (Methylation-Sensitive Arbitrarily-Primed Polymerase Chain Reaction) refers to the art-recognized technology that allows for a global scan of the genome using CG-rich primers to focus on the regions most likely to contain CpG dinucleotides, and described by Gonzalgo et al., Cancer Research 57:594-599, 1997.
The term “MethyLight™” refers to the art-recognized fluorescence-based real-time PCR technique described by Eads et al., Cancer Res. 59:2302-2306, 1999.
The term “HeavyMethyl™” assay, in the embodiment thereof implemented herein, refers to an assay, wherein methylation specific blocking probes (also referred to herein as blockers) covering CpG positions between, or covered by the amplification primers enable methylation-specific selective amplification of a nucleic acid sample. The HeavyMethyl assay has previously been described in WO02/072880 and Cottrell et al. Nucleic Acids Res. 2004 Jan. 13; 32(1):e10.
The term “HeavyMethyl™ MethyLigh™ assay, in the embodiment thereof implemented herein, refers to a HeavyMethyl™ MethyLight™ assay, which is a variation of the MethyLight™ assay, wherein the MethyLight™ assay is combined with methylation specific blocking probes covering CpG positions between the amplification primers.
The term “Ms-SNuPE” (Methylation-sensitive Single Nucleotide Primer Extension) refers to the art-recognized assay described by Gonzalgo and Jones, Nucleic Acids Res. 25:2529-2531, 1997.
The term “MSP” (Methylation-specific PCR) refers to the art-recognized methylation assay described by Herman et al. Proc. Natl. Acad. Sci. USA 93:9821-9826, 1996, and by U.S. Pat. No. 5,786,146.
The term “COBRA” (Combined Bisulfite Restriction Analysis) refers to the art-recognized methylation assay described by Xiong and Laird, Nucleic Acids Res. 25:2532-2534, 1997.
The term “MCA” (Methylated CpG Island Amplification) refers to the methylation assay described by Toyota et al., Cancer Res. 59:2307-12, 1999, and in WO 00/26401 A1.
The term “hybridization” is to be understood as a bond of an oligonucleotide to a complementary sequence along the lines of the Watson-Crick base pairings in the sample DNA, forming a duplex structure.
“Stringent hybridization conditions,” as defined herein, involve hybridizing at 68° C. in 5×SSC/5×Denhardt's solution/1.0% SDS, and washing in 0.2×SSC/0.1% SDS at room temperature, or involve the art-recognized equivalent thereof (e.g., conditions in which a hybridization is carried out at 60° C. in 2.5×SSC buffer, followed by several washing steps at 37° C. in a low buffer concentration, and remains stable). Moderately stringent conditions, as defined herein, involve including washing in 3×SSC at 42° C., or the art-recognized equivalent thereof. The parameters of salt concentration and temperature can be varied to achieve the optimal level of identity between the probe and the target nucleic acid. Guidance regarding such conditions is available in the art, for example, by Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds.), 1995, Current Protocols in Molecular Biology, (John Wiley and Sons, N.Y.) at Unit 2.10.
The terms “array SEQ ID NO,” “composite array SEQ ID NO,” or “composite array sequence” refer to a sequence, hypothetical or otherwise, consisting of a head-to-tail (5′ to 3′) linear composite of all individual contiguous sequences of a subject array (e.g., a head-to-tail composite of SEQ ID NO: 1-118, in that order).
The terms “array SEQ ID NO node,” “composite array SEQ ID NO node,” or “composite array sequence node” refer to a junction between any two individual contiguous sequences of the “array SEQ ID NO,” the “composite array SEQ ID NO,” or the “composite array sequence.”
In reference to composite array sequences, the phrase “contiguous nucleotides” refers to a contiguous sequence region of any individual contiguous sequence of the composite array, but does not include a region of the composite array sequence that includes a “node,” as defined herein above.
The present invention provides for molecular genetic markers that have novel utility for the analysis of gene expression, most preferably as expressed in the methylation thereof, associated with the development of breast cell proliferative disorders. Said markers may be used for detecting and/or distinguishing between breast cell proliferative disorders, thereby providing improved means for the detection, classification and treatment of said disorders.
Bisulfite modification of DNA is an art-recognized tool used to assess CpG methylation status. 5-methylcytosine is the most frequent covalent base modification in the DNA of eukaryotic cells. It plays a role, for example, in the regulation of the transcription, in genetic imprinting, and in tumorigenesis. Therefore, the identification of 5-methylcytosine as a component of genetic information is of considerable interest. However, 5-methylcytosine positions cannot be identified by sequencing, because 5-methylcytosine has the same base pairing behavior as cytosine. Moreover, the epigenetic information carried by 5-methylcytosine is completely lost during, e.g., PCR amplification.
The most frequently used method for analyzing DNA for the presence of 5-methylcytosine is based upon the specific reaction of bisulfite with cytosine whereby, upon subsequent alkaline hydrolysis, cytosine is converted to uracil which corresponds to thymine in its base pairing behavior. Significantly, however, 5-methylcytosine remains unmodified under these conditions. Consequently, the original DNA is converted in such a manner that methylcytosine, which originally could not be distinguished from cytosine by its hybridization behavior, can now be detected as the only remaining cytosine using standard, art-recognized molecular biological techniques, for example, by amplification and hybridization, or by sequencing. All of these techniques are based on differential base pairing properties, which can now be fully exploited.
The present invention provides for the use of the bisulfite technique, in combination with one or more methylation assays, for determination of the methylation status of CpG dinucleotide sequences within sequences from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118. According to the present invention, determination of the methylation status of CpG dinucleotide sequences within sequences from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118 has diagnostic and prognostic utility.
Methylation Assay Procedures. Various methylation assay procedures are known in the art, and can be used in conjunction with the present invention. These assays allow for determination of the methylation state of one or a plurality of CpG dinucleotides (e.g., CpG islands) within a DNA sequence. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), Southern blot analysis, and use of methylation-sensitive restriction enzymes.
For example, genomic sequencing has been simplified for analysis of DNA methylation patterns and 5-methylcytosine distribution by using bisulfite treatment (Frommer et al., Proc. Natl. Acad. Sci. USA 89:1827-1831, 1992). Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA is used, e.g., the method described by Sadri and Hornsby (Nucl. Acids Res. 24:5058-5059, 1996), or COBRA (Combined Bisulfite Restriction Analysis) (Xiong and Laird, Nucleic Acids Res. 25:2532-2534, 1997).
COBRA. COBRA analysis is a quantitative methylation assay useful for determining DNA methylation levels at specific gene loci in small amounts of genomic DNA (Xiong and Laird, Nucleic Acids Res. 25:2532-2534, 1997). Briefly, restriction enzyme digestion is used to reveal methylation-dependent sequence differences in PCR products of sodium bisulfite-treated DNA. Methylation-dependent sequence differences are first introduced into the genomic DNA by standard bisulfite treatment according to the procedure described by Frommer et al. (Proc. Natl. Acad. Sci. USA 89:1827-1831, 1992). PCR amplification of the bisulfite converted DNA is then performed using primers specific for the CpG islands of interest, followed by restriction endonuclease digestion, gel electrophoresis, and detection using specific, labeled hybridization probes. Methylation levels in the original DNA sample are represented by the relative amounts of digested and undigested PCR product in a linearly quantitative fashion across a wide spectrum of DNA methylation levels. In addition, this technique can be reliably applied to DNA obtained from microdissected paraffin-embedded tissue samples. Typical reagents (e.g., as might be found in a typical COBRA-based kit) for COBRA analysis may include, but are not limited to: PCR primers for specific gene (or bisulfite treated DNA sequence or CpG island); restriction enzyme and appropriate buffer; gene-hybridization oligo; control hybridization oligo; kinase labeling kit for oligo probe; and labeled nucleotides. Additionally, bisulfite conversion reagents may include: DNA denaturation buffer; sulfonation buffer; DNA recovery reagents or kits (e.g., precipitation, ultrafiltration, affinity column); desulfonation buffer; and DNA recovery components.
Preferably, assays such as “MethyLight™” (a fluorescence-based real-time PCR technique) (Eads et al., Cancer Res. 59:2302-2306, 1999), Ms-SNuPE (Methylation-sensitive Single Nucleotide Primer Extension) reactions (Gonzalgo and Jones, Nucleic Acids Res. 25:2529-2531, 1997) and methylation-specific PCR (“MSP”; Herman et al., Proc. Natl. Acad. Sci. USA 93:9821-9826, 1996; U.S. Pat. No. 5,786,146) are used alone or in combination with other of these methods.
MethyLight™. The MethyLight™ assay is a high-throughput quantitative methylation assay that utilizes fluorescence-based real-time PCR (TaqMan™) technology that requires no further manipulations after the PCR step (Eads et al., Cancer Res. 59:2302-2306, 1999). Briefly, the MethyLight™ process begins with a mixed sample of genomic DNA that is converted, in a sodium bisulfite reaction, to a mixed pool of methylation-dependent sequence differences according to standard procedures (the bisulfite process converts unmethylated cytosine residues to uracil). Fluorescence-based PCR is then performed either in an “unbiased” (with primers that do not overlap known CpG methylation sites) PCR reaction, or in a “biased” (with PCR primers that overlap known CpG dinucleotides) reaction. Sequence discrimination can occur either at the level of the amplification process or at the level of the fluorescence detection process, or both.
The MethyLight™ assay may be used as a quantitative test for methylation patterns in the genomic DNA sample, wherein sequence discrimination occurs at the level of probe hybridization. In this quantitative version, the PCR reaction provides for unbiased amplification in the presence of a fluorescent probe that overlaps a particular putative methylation site. An unbiased control for the amount of input DNA is provided by a reaction in which neither the primers, nor the probe overlie any CpG dinucleotides. Alternatively, a qualitative test for genomic methylation is achieved by probing of the biased PCR pool with either control oligonucleotides that do not “cover” known methylation sites (a fluorescence-based version of the “MSP” technique), or with oligonucleotides covering potential methylation sites.
The MethyLight™ process can by used with a “TaqMan®” probe in the amplification process. For example, double-stranded genomic DNA is treated with sodium bisulfite and subjected to one of two sets of PCR reactions using TaqMan® probes; e.g., with either biased primers and TaqMan® probe, or unbiased primers and TaqMan® probe. The TaqMan® probe is dual-labeled with fluorescent “reporter” and “quencher” molecules, and is designed to be specific for a relatively high GC content region so that it melts out at about 10° C. higher temperature in the PCR cycle than the forward or reverse primers. This allows the TaqMan® probe to remain fully hybridized during the PCR annealing/extension step. As the Taq polymerase enzymatically synthesizes a new strand during PCR, it will eventually reach the annealed TaqMan® probe. The Taq polymerase 5′ to 3′ endonuclease activity will then displace the TaqMan® probe by digesting it to release the fluorescent reporter molecule for quantitative detection of its now unquenched signal using a real-time fluorescent detection system.
Typical reagents (e.g., as might be found in a typical MethyLight™-based kit) for MethyLight™ analysis may include, but are not limited to: PCR primers for specific gene (or bisulfite treated DNA sequence or CpG island); TaqMan® probes; optimized PCR buffers and deoxynucleotides; and Taq polymerase.
Ms-SNuPE. The Ms-SNuPE technique is a quantitative method for assessing methylation differences at specific CpG sites based on bisulfite treatment of DNA, followed by single-nucleotide primer extension (Gonzalgo and Jones, Nucleic Acids Res. 25:2529-2531, 1997). Briefly, genomic DNA is reacted with sodium bisulfite to convert umethylated cytosine to uracil while leaving 5-methylcytosine unchanged. Amplification of the desired target sequence is then performed using PCR primers specific for bisulfite-converted DNA, and the resulting product is isolated and used as a template for methylation analysis at the CpG site(s) of interest. Small amounts of DNA can be analyzed (e.g., microdissected pathology sections), and it avoids utilization of restriction enzymes for determining the methylation status at CpG sites.
Typical reagents (e.g., as might be found in a typical Ms-SNuPE-based kit) for Ms-SNuPE analysis may include, but are not limited to: PCR primers for specific gene (or bisulfite treated DNA sequence or CpG island); optimized PCR buffers and deoxynucleotides; gel extraction kit; positive control primers; Ms-SNuPE primers for specific gene; reaction buffer (for the Ms-SNuPE reaction); and labeled nucleotides. Additionally, bisulfite conversion reagents may include: DNA denaturation buffer; sulfonation buffer; DNA recovery regents or kit (e.g., precipitation, ultrafiltration, affinity column); desulfonation buffer; and DNA recovery components.
MSP. MSP (methylation-specific PCR) allows for assessing the methylation status of virtually any group of CpG sites within a CpG island, independent of the use of methylation-sensitive restriction enzymes (Herman et al. Proc. Natl. Acad. Sci. USA 93:9821-9826, 1996; U.S. Pat. No. 5,786,146). Briefly, DNA is modified by sodium bisulfite converting all unmethylated, but not methylated cytosines to uracil, and subsequently amplified with primers specific for methylated versus umethylated DNA. MSP requires only small quantities of DNA, is sensitive to 0.1% methylated alleles of a given CpG island locus, and can be performed on DNA extracted from paraffin-embedded samples. Typical reagents (e.g., as might be found in a typical MSP-based kit) for MSP analysis may include, but are not limited to: methylated and unmethylated PCR primers for specific gene (or bisulfite treated DNA sequence or CpG island), optimized PCR buffers and deoxynucleotides, and specific probes.
Genomic Sequences according to SEQ ID NO: 1 to SEQ ID NO: 118, and non-naturally occurring treated variants thereof according to SEQ ID NO: 493 to SEQ ID NO: 964, were determined to have utility for the detection, classification and/or treatment of breast cell proliferative disorders.
In one embodiment the invention provides a method for detecting and/or for detecting and distinguishing between or among breast cell proliferative disorders in a subject. Said method comprises the following steps
i) contacting genomic DNA obtained from the subject with at least one reagent, or series of reagents that distinguishes between methylated and non-methylated CpG dinucleotides within at least one target region of the genomic DNA, wherein said contiguous nucleotides comprise at least one CpG dinucleotide sequence, and
ii) detecting, or detecting and distinguishing between or among breast cell proliferative disorders.
It is particularly preferred that said genomic DNA is isolated from body fluids of the subject. It is further preferred that said body fluid is selected from the group consisting nipple aspirate fluid, lymphatic fluid, ductal lavage fluid, fine needle aspirate, blood plasma, blood serum, whole blood, isolated blood cells, cells isolated from the blood.
Genomic DNA may be isolated by any means standard in the art, including the use of commercially available kits. Briefly, wherein the DNA of interest is encapsulated in by a cellular membrane the biological sample must be disrupted and lysed by enzymatic, chemical or mechanical means. The DNA solution may then be cleared of proteins and other contaminants e.g. by digestion with proteinase K. The genomic DNA is then recovered from the solution. This may be carried out by means of a variety of methods including salting out, organic extraction or binding of the DNA to a solid phase support. The choice of method will be affected by several factors including time, expense and required quantity of DNA.
The genomic DNA sample is then treated in such a manner that cytosine bases which are unmethylated at the 5′-position are converted to uracil, thymine, or another base which is dissimilar to cytosine in terms of hybridization behavior. This will be understood as ‘treatment’ herein.
This is preferably achieved by means of treatment with a bisulfite reagent. The term “bisulfite reagent” refers to a reagent comprising bisulfite, disulfite, hydrogen sulfite or combinations thereof, useful as disclosed herein to distinguish between methylated and unmethylated CpG dinucleotide sequences. Methods of said treatment are known in the art (e.g. PCT/EP2004/011715, which is incorporated by reference in its entirety). It is preferred that the bisulfite treatment is conducted in the presence of denaturing solvents such as but not limited to n-alkylenglycol, particulary diethylene glycol dimethyl ether (DME), or in the presence of dioxane or dioxane derivatives. In a preferred embodiment the denaturing solvents are used in concentrations between 1% and 35% (v/v). It is also preferred that the bisulfite reaction is carried out in the presence of scavengers such as but not limited to chromane derivatives, e.g., 6-hydroxy-2,5,7,8-tetramethylchromane 2-carboxylic acid (see: PCT/EP2004/011715 which is incorporated by reference in its entirety). The bisulfite conversion is preferably carried out at a reaction temperature between 30° C. and 70° C., whereby the temperature is increased to over 85° C. for short periods of times during the reaction (see: PCT/EP2004/011715 which is incorporated by reference in its entirety). The bisulfite treated DNA is preferably purified prior to further analysis. This may be conducted by any means known in the art, such as but not limited to ultrafiltration, preferably carried out by means of Microcon™ columns (manufactured by Millipore™). The purification is carried out according to a modified manufacturer's protocol (see: PCT/EP2004/011715 which is incorporated by reference in its entirety).
The treated DNA is then analyzed in order to determine the methylation state of one or more target gene sequences (prior to the treatment) associated with the development of breast carcinoma. It is particularly preferred that the target region comprises, or hybridizes under stringent conditions to at least 16 contiguous nucleotides of at least one gene or genomic sequence selected from the group consisting the genes and genomic sequences as listed in Table 3. It is further preferred that the sequences of said genes in Table 3 as described in the accompanying sequence listing are analyzed. The method of analysis may be selected from those known in the art, including those listed herein. Particularly preferred are MethyLight™, MSP and the use of blocking oligonucleotides as will be described herein. It is further preferred that any oligonucleotides used in such analysis (including primers, blocking oligonucleotides and detection probes) should be reverse complementary, identical, or hybridize under stringent or highly stringent conditions to an at least 16-base-pair long segment of the base sequences of one or more of SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto.
Aberrant methylation, more preferably hypermethylation of one or more genes or genomic sequences taken from those listed in Table 3 are associated with the presence of breast carcinoma. Analysis of one or a plurality of the sequences enables detecting, or detecting and distinguishing between or among breast cell proliferative disorders.
In one embodiment, the method discloses the use of one or more genes or genomic sequences selected from the group consisting of APC, ARH1/NOEY2, BRCA2, CCND2, CDKN1A, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, MLH1, PGR, SERPINB5, RARB, SFN, SOD2, TGFBR2, THRB, TIMP3, TP73, NME1, CDH13, THBS1, TMS1/ASC, ESR1, IL6, APAF1, CASP8, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, RARA, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, BRCA1, LOT1, PRSS8, SNCG, TPM1, GPC3, CLDN7, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, RAP2B, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 45, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 as markers for the detection of breast cancers.
The use of said genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
It is also preferred that the expression level of only one gene or genomic sequence from the group consisting of PRDM2, PLAU, GSTP1, SLIT2, CCND2, HOXA5, RASSF1A, HS3ST2, ARH1/NOEY2, SCGB3A1, LIMK-1, SEQ ID NO: 6, SEQ ID NO: 3, SEQ ID NO: 18, SEQ ID NO: 7, SEQ ID NO: 41, SEQ ID NO: 22, SEQ ID NO: 46, SEQ ID NO: 13, and SEQ ID NO: 31 is analyzed. It is particularly preferred that this is carried out by means of methylation analysis.
In one embodiment the method discloses the use of one or more genes or genomic sequences selected from the group consisting of ARH1/NOEY2, CCND2, CDKN1A, CDKN2A, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, SERPINB5, TERT, TGFBR2, THRB, TIMP3, TP73, NME1, CDH13, THBS1, TMS1/ASC, IL6, APAF1, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, LOT1, GPC3, CLDN7, GJB2, SLIT2, IGSF4, MCT1, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), LIM DOMAIN KINASE 1, MGC34831, SEQ ID NO: 54, MSF, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, NR2E1, PCDH7, RTTN, SNAP25, SEQ ID NO: 26, SEQ ID NO: 28, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, MGC10561, LMX1A, SENP3, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 45, O60279, SEQ ID NO: 48 as markers for the differentiation of breast cancers from other cancers. Said use of the genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
In said embodiment it is further preferred that the expression level of only one gene or genomic sequence from the group consisting PRDM2, GSTP1, ALX4, HOXA5, PLAU, RASSF1A, IGSF4, SLIT2, DAPK1, CDKN1A, SEQ ID NO: 38, SEQ ID NO: 35, LIMK-1, SEQ ID NO: 39, SEQ ID NO: 10, SEQ ID NO: 26, SEQ ID NO: 8, SEQ ID NO: 22, SEQ ID NO: 18, and SEQ ID NO: 47 is analyzed. It is particularly preferred that this is carried out by means of methylation analysis.
In one embodiment the method discloses the use of one or more genes or genomic sequences selected from the group consisting of APC, ARH1/NOEY2, CCND2, CDH1, CDKN1A, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, PGR, SERPINB5, RARB, SFN, SOD2, TERT, TGFBR2, THRB, TIMP3, NME1, CDH13, THBS1, TMS1/ASC, ESR1, IL6, APAF1, CASP8, SYK, HOXA5, FABP3, RASSFIA, SEQ ID NO: 3, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, BRCA1, LOT1, PRSS8, SNCG, GPC3, CLDN7, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, RAP2B, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, SEQ ID NO: 36, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 as markers for the detection of breast cancer cells within blood or blood derived fluids by differentiation of breast cancer cells from peripheral blood lymphocytes. Said use of the genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
In said embodiment it is further preferred that the expression level of only one gene or genomic sequence from the group consisting FABP3, RASSF1A, MSF, PRDM6, LMX1A, SEQ ID NO: 4, SCGB3A1, SLIT2, NR2E1, EYA4, PRDM2, SERPINB5, TWIST, STAT1, ALX4, IGFBP7, DAPK1, THBS1, PLAU, SEQ ID NO: 20, SEQ ID NO: 38, SEQ ID NO: 8, SEQ ID NO: 37, SEQ ID NO: 10, SEQ ID NO: 35, SEQ ID NO: 47, SEQ ID NO: 6, LIMK-1 and SEQ ID NO: 46 is analyzed for the detection of breast cancer cells within blood or blood derived fluids by differentiation of breast cancer cells from peripheral blood lymphocytes. Said use of the genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
In one embodiment the method discloses the use of one or more genes or genomic sequences selected from the group consisting of APC, ARH1/NOEY2, CCND2, CDKN1A, CDKN2A, SEQ ID NO: 9, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, MLH1, PGR, SERPINB5, RARB, SOD2, TERT, TGFBR2, THRB, TIMP3, TP73, CDH13, THBS1, TMS1/ASC, ESR1, APAF1, CASP8, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, RARA, TWIST, ESR2, PLAU, SEQ ID NO: 4, SNCG, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL1B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, NR2E1, PCDH7, DKK3, RTTN, SNAP25, G1RK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 36, LMX1A, SENP3, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 as markers for the differentiation of DCIS from benign breast disorders. The term “benign breast disorders” as used herein shall be taken to include healthy breast tissue, fibroadenoma, fibrocystic disease and atypical ductal hyperplasia. Said use of the genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
It is also preferred that the expression level of only one gene or genomic sequence from the group consisting HS3ST2, SLIT2, RASSF1A, GSTP1, GJB2, IGFBP7, CDH13, ARH1/NOEY2, SCGB3A1, FHIT, SEQ ID NO: 27, LIMK-1, SEQ ID NO: 46, SEQ ID NO: 3, SEQ ID NO: 117, SEQ ID NO: 48, SEQ ID NO: 41, SEQ ID NO: 4, SEQ ID NO: 6, and SEQ ID NO: 24 is analyzed. It is particularly preferred that this is carried out by means of methylation analysis.
In one embodiment the method discloses the use of one or more genes or genomic sequences selected from the group consisting of ARH1/NOEY2, CCND2, CDKNLA, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, SERPINB5, TERT, TGFBR2, THRB, TIMP3, TP73, NME1, CDH13, THBS1, TMS1/ASC, IL6, APAF1, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, LOT1, GPC3, CLDN7, GJB2, SLIT2, IGSF4, MCT1, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, BCL11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, SEQ ID NO: 36, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 45, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 as markers for the differentiation of breast cancer from benign breast disorders. The term benign breast disorders as used herein shall be taken to include healthy breast tissue, fibroadenoma, fibrocystic disease and atypical ductal hyperplasia. Said use of the genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
It is also preferred that the expression level of only one gene or genomic sequence from the group consisting SLIT2, HS3ST2, HOXA5, ARH1/NOEY2, IGFBP7, PLAU, CDH13, TIMP3, CCND2, GSTP1, SEQ ID NO: 117, LIMK-1, SEQ ID NO: 46, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 3, SEQ ID NO: 41, SEQ ID NO: 27, SEQ ID NO: 31, and SEQ ID NO: 4 is analyzed. It is particularly preferred that this is carried out by means of methylation analysis.
Aberrant levels of mRNA expression of the genes, genomic sequences or genes regulated by genomic sequences according to Table 3 are associated with breast cell proliferative disorders. Accordingly, decreased levels of expression of said all of said genes or sequences with the exception of PRDM2 and S100A7 are associable with the development of breast cancers and other breast cell proliferative disorders. Increased levels of expression of PRDM2 and S100A7 are associable with the development of breast cancers and other breast cell proliferative disorders.
To detect the presence of mRNA encoding a gene or genomic sequence in a detection system for breast cancer, a sample is obtained from a patient. The sample can be a tissue biopsy sample or a sample of blood, plasma, serum or the like. The sample may be treated to extract the nucleic acids contained therein. The resulting nucleic acid from the sample is subjected to gel electrophoresis or other separation techniques. Detection involves contacting the nucleic acids and in particular the mRNA of the sample with a DNA sequence serving as a probe to form hybrid duplexes. The stringency of hybridization is determined by a number of factors during hybridization and during the washing procedure, including temperature, ionic strength, length of time and concentration of formamide. These factors are outlined in, for example, Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2d ed., 1989). Detection of the resulting duplex is usually accomplished by the use of labeled probes. Alternatively, the probe may be unlabeled, but may be detectable by specific binding with a ligand which is labeled, either directly or indirectly. Suitable labels and methods for labeling probes and ligands are known in the art, and include, for example, radioactive labels which may be incorporated by known methods (e.g., nick translation or kinasing), biotin, fluorescent groups, chemiluminescent groups (e.g., dioxetanes, particularly triggered dioxetanes), enzymes, antibodies, and the like.
In order to increase the sensitivity of the detection in a sample of mRNA transcribed from the gene or genomic sequence, the technique of reverse transcription/polymerization chain reaction can be used to amplify cDNA transcribed from the mRNA. The method of reverse transcription/PCR is well known in the art (for example, see Watson and Fleming, supra).
The reverse transcription/PCR method can be performed as follows. Total cellular RNA is isolated by, for example, the standard guanidium isothiocyanate method and the total RNA is reverse transcribed. The reverse transcription method involves synthesis of DNA on a template of RNA using a reverse transcriptase enzyme and a 3′ end primer. Typically, the primer contains an oligo(dT) sequence. The cDNA thus produced is then amplified using the PCR method and specific primers. (Belyavsky et al, Nucl Acid Res 17:2919-2932, 1989; Krug and Berger, Methods in Enzymology, Academic Press, N.Y., Vol. 152, pp. 316-325, 1987 which are incorporated by reference)
The present invention may also be described in certain embodiments as a kit for use in detecting a breast cell proliferative disorder state through testing of a biological sample. A representative kit may comprise one or more nucleic acid segments that selectively hybridize to the mRNA and a container for each of the one or more nucleic acid segments. In certain embodiments the nucleic acid segments may be combined in a single tube. In further embodiments, the nucleic acid segments may also include a pair of primers for amplifying the target mRNA. Such kits may also include any buffers, solutions, solvents, enzymes, nucleotides, or other components for hybridization, amplification or detection reactions. Preferred kit components include reagents for reverse transcription-PCR, in-situ hybridization, Northern analysis and/or RPA
The present invention further provides for methods to detect the presence of the polypeptide encoded by said genes or gene sequences in a sample obtained from a patient.
Aberrant levels of polypeptide expression of the polypeptides encoded by the genes, genomic sequences or genes regulated by genomic sequences of the group consisting all genes and genomic sequences of genomic regions listed in Table 3 are associated with breast carcinoma. Accordingly over expression of said polypeptides with the exception of polypeptides encoded by the PRDM2 and S100A7 genes are associable with the development of breast carcinoma and other breast cell proliferative disorders. Under expression of PRDM2 and S100A7 are associable with the development of breast cancers and other breast cell proliferative disorders.
Any method known in the art for detecting proteins can be used. Such methods include, but are not limited to immunodiffusion, immunoelectrophoresis, immunochemical methods, binder-ligand assays, immunohistochemical techniques, agglutination and complement assays. (for example see Basic and Clinical Immunology, Sites and Terr, eds., Appleton and Lange, Norwalk, Conn. pp 217-262, 1991 which is incorporated by reference). Preferred are binder-ligand immunoassay methods including reacting antibodies with an epitope or epitopes and competitively displacing a labeled protein or derivative thereof.
Certain embodiments of the present invention comprise the use of antibodies specific to the polypeptide encoded by the genes or genomic sequences of the group consisting all genes and genomic sequences of genomic regions as listed in Table 3, below.
Such antibodies may be useful for diagnostic and prognostic applications in detecting the disease state, by comparing a patient's levels of breast disease marker expression to expression of the same markers in normal individuals. In certain embodiments production of monoclonal or polyclonal antibodies can be induced by the use of the coded polypeptide as antigene. Such antibodies may in turn be used to detect expressed proteins as markers for human disease states. The levels of such proteins present in the peripheral blood or tissue sample of a patient may be quantified by conventional methods. Antibody-protein binding may be detected and quantified by a variety of means known in the art, such as labeling with fluorescent or radioactive ligands. The invention further comprises kits for performing the above-mentioned procedures, wherein such kits contain antibodies specific for the investigated polypeptides.
Numerous competitive and non-competitive protein binding immunoassays are well known in the art. Antibodies employed in such assays may be unlabeled, for example as used in agglutination tests, or labeled for use a wide variety of assay methods. Labels that can be used include radionuclides, enzymes, fluorescers, chemiluminescers, enzyme substrates or co-factors, enzyme inhibitors, particles, dyes and the like for use in radioimmunoassay (RIA), enzyme immunoassays, e.g., enzyme-linked immunosorbent assay (ELISA), fluorescent immunoassays and the like. Polyclonal or monoclonal antibodies or epitopes thereof can be made for use in immunoassays by any of a number of methods known in the art. One approach for preparing antibodies to a protein is the selection and preparation of an amino acid sequence of all or part of the protein, chemically synthesizing the sequence and injecting it into an appropriate animal, usually a rabbit or a mouse (Milstein and Kohler Nature 256:495-497, 1975; Gulfre and Milstein, Methods in Enzymology: Immunochemical Techniques 73:1-46, Langone and Banatis eds., Academic Press, 1981 which are incorporated by reference). Methods for preparation of the polypeptides or epitopes thereof include, but are not limited to chemical synthesis, recombinant DNA techniques or isolation from biological samples.
In a further embodiment the present invention is based upon the analysis of methylation levels within two or more genes or genomic sequences taken from the group consisting all genes and genomic sequences of genomic regions listed in Table 3 and/or their regulatory sequences. It is further preferred that the sequences of said genes or genomic sequences are as according to SEQ ID NO: 1 to SEQ ID NO: 118.
Particular embodiments of the present invention provide a novel application of the analysis of methylation status, levels and/or patterns within said sequences that enables a precise detection and/or characterization of breast cell proliferative disorders. Early detection of breast cell proliferative disorders is directly linked with disease prognosis, and the disclosed method thereby enables the physician and patient to make early and more informed treatment decisions.
The present invention provides novel uses for genomic sequences selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118. Additional embodiments provide modified variants, in particular chemically modified variants, of SEQ ID NO: 1 to SEQ ID NO: 118, as well as oligonucleotides and/or PNA-oligomers for analysis of cytosine methylation patterns within the group consisting SEQ ID NO: 1 to SEQ ID NO: 118.
An objective of the invention comprises analysis of the methylation state of one or more CpG dinucleotides within at least one of the genomic sequences selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 118 and sequences complementary thereto.
The disclosed invention provides treated nucleic acids, derived from genomic SEQ ID NO: 1 to SEQ ID NO: 118, wherein the treatment is suitable to convert at least one unmethylated cytosine base of the genomic DNA sequence to uracil or another base that is detectably dissimilar to cytosine in terms of hybridization. The genomic sequences in question may comprise one, or more, consecutive or random methylated CpG positions. Said treatment preferably comprises use of a reagent selected from the group consisting of bisulfite, hydrogen sulfite, disulfite, and combinations thereof. In a preferred embodiment the invention provides a non-naturally occurring modified nucleic acid comprising a sequence of at least 16 contiguous nucleotide bases in length of a sequence selected from the group consisting of SEQ ID NO: 493 TO SEQ ID NO: 964, wherein said sequence comprises at least one CpG, TpA or CpA dinucleotide and sequences complementary thereto. The sequences of SEQ ID NO: 493 TO SEQ ID NO: 964 provide non-naturally occurring modified versions of the nucleic acid according to SEQ ID NO: 1 to SEQ ID NO: 118, wherein the modification of each genomic sequence results in the synthesis of a nucleic acid having a sequence that is unique and distinct from said genomic sequence as follows. Particularly preferred is a nucleic acid comprising a sequence that is not identical to a part of human genomic DNA, including human genomic DNA comprising one or more methylated CpG positions.
For each sense strand genomic DNA, e.g., SEQ ID NO: 1, four converted versions are disclosed. A first version wherein “C” is converted to “T,” but “CpG” remains “CpG” (i.e., corresponds to case where, for the genomic sequence, all “C” residues of CpG dinucleotide sequences are methylated and are thus not converted); a second version discloses the complement of the disclosed genomic DNA sequence (i.e. antisense strand), wherein “C” is converted to “T,” but “CpG” remains “CpG” (i.e., corresponds to case where, for all “C” residues of CpG dinucleotide sequences are methylated and are thus not converted). The ‘upmethylated’ converted sequences of SEQ ID NO: 1 to SEQ ID NO: 118 correspond to SEQ ID NO: 493 to SEQ ID NO: 728. A third chemically converted version of each genomic sequences is provided, wherein “C” is converted to “T” for all “C” residues, including those of “CpG” dinucleotide sequences (i.e., corresponds to case where, for the genomic sequences, all “C” residues of CpG dinucleotide sequences are unmethylated); a final chemically converted version of each sequence, discloses the complement of the disclosed genomic DNA sequence (i.e. antisense strand), wherein “C” is converted to “T” for all “C” residues, including those of “CpG” dinucleotide sequences (i.e., corresponds to case where, for the complement (antisense strand) of each genomic sequence, all “C” residues of CpG dinucleotide sequences are unmethylated). The ‘downmethylated’ converted sequences of SEQ ID NO: 1 to SEQ ID NO: 118 correspond to SEQ ID NO: 729 to SEQ ID NO: 964.
In an alternative preferred embodiment, the method according to the invention comprises the use of an oligonucleotide or oligomer for detecting the cytosine methylation state within genomic or treated (chemically modified) DNA, according to SEQ ID NO: 1 to SEQ ID NO: 964. Said oligonucleotide or oligomer comprising a nucleic acid sequence having a length of at least nine (9) nucleotides which hybridizes, under moderately stringent or stringent conditions (as defined herein above), to a treated nucleic acid sequence according to SEQ ID NO: 493 to SEQ ID NO: 964 and/or sequences complementary thereto, or to a genomic sequence according to SEQ ID NO: 1 to SEQ ID NO: 118 and/or sequences complementary thereto.
Thus, the present invention provides nucleic acid molecules (e.g., oligonucleotides and peptide nucleic acid (PNA) molecules (PNA-oligomers) having a length of at least nine (9) nucleotides that hybridize under moderately stringent and/or stringent hybridization conditions to all or a portion of the sequences SEQ ID NO: 1 to SEQ ID NO: 964, or to the complements thereof.
Particularly preferred are oligonucleotides and peptide nucleic acid (PNA) molecules that hybridize under moderately stringent and/or stringent hybridization conditions to all or a portion of the sequences SEQ ID NO: 493 to SEQ ID NO: 964, or to the complements thereof wherein said sequence comprises at least one CpG, TpA or CpA dinucleotide and furthermore wherein said sequence is not identical to a part of human genomic DNA, including human genomic DNA comprising one or more methylated CpG positions.
The hybridizing portion of the hybridizing nucleic acids is typically at least 9, 15, 20, 25, 30 or 35 nucleotides in length. However, longer molecules have inventive utility, and are thus within the scope of the present invention.
Preferably, the hybridizing portion of the inventive hybridizing nucleic acids is at least 95%, or at least 98%, or 100% identical to the sequence, or to a portion thereof of SEQ ID NO: 1 to SEQ ID NO: 964, or to the complements thereof.
Hybridizing nucleic acids of the type described herein can be used, for example, as a primer (e.g., a PCR primer), or a diagnostic and/or prognostic probe or primer. Preferably, hybridization of the oligonucleotide probe to a nucleic acid sample is performed under stringent conditions and the probe is 100% identical to the target sequence. Nucleic acid duplex or hybrid stability is expressed as the melting temperature or Tm, which is the temperature at which a probe dissociates from a target DNA. This melting temperature is used to define the required stringency conditions.
For target sequences that are related and substantially identical to the corresponding sequence of SEQ ID NO: 1 to SEQ ID NO: 964 (such as allelic variants and SNPs), rather than identical, it is useful to first establish the lowest temperature at which only homologous hybridization occurs with a particular concentration of salt (e.g., SSC or SSPE). Then, assuming that 1% mismatching results in a 1° C. decrease in the Tm, the temperature of the final wash in the hybridization reaction is reduced accordingly (for example, if sequences having >95% identity with the probe are sought, the final wash temperature is decreased by 5° C.). In practice, the change in Tm can be between 0.5° C. and 1.5° C. per 1% mismatch.
Examples of inventive oligonucleotides of length X (in nucleotides), as indicated by polynucleotide positions with reference to, e.g., SEQ ID NO: 1, include those corresponding to sets (sense and antisense sets) of consecutively overlapping oligonucleotides of length X, where the oligonucleotides within each consecutively overlapping set (corresponding to a given X value) are defined as the finite set of Z oligonucleotides from nucleotide positions:
n to (n+(X−1));
where n=1, 2, 3, . . . (Y−(X−1));
where Y equals the length (nucleotides or base pairs) of SEQ ID NO: 1 (6435);
where X equals the common length (in nucleotides) of each oligonucleotide in the set (e.g.,
X=20 for a set of consecutively overlapping 20-mers); and where the number (Z) of consecutively overlapping oligomers of length X for a given SEQ ID NO of length Y is equal to Y−(X−1).
For example Z=6435−19=6416 for either sense or antisense sets of SEQ ID NO: 1, where X=20.
Examples of inventive 20-mer oligonucleotides include the following set of 6416 oligomers (and the antisense set complementary thereto), indicated by polynucleotide positions with reference to SEQ ID NO: 1:1-20, 2-21, 3-22, 4-23, 5-24, etc. . . . 6416-6435.
Likewise, examples of inventive 25-mer oligonucleotides include the following set of 6410 oligomers (and the antisense set complementary thereto), indicated by polynucleotide positions with reference to SEQ ID NO: 1:1-25, 2-26, 3-27, 4-28, 5-29, etc. . . . 6408-6433, 6409-6434 and 6410-6435.
Preferably, the set is limited to those oligomers that comprise at least one CpG, TpG or CpA dinucleotide.
The present invention encompasses, for each of SEQ ID NO: 1 to SEQ ID NO: 964 (sense and antisense), multiple consecutively overlapping sets of oligonucleotides or modified oligonucleotides of length X, where, e.g., X=9, 10, 17, 20, 22, 23, 25, 27, 30 or 35 nucleotides.
The oligonucleotides or oligomers according to the present invention constitute effective tools useful to ascertain genetic and epigenetic parameters of the genomic sequence corresponding to SEQ ID NO: 1 to SEQ ID NO: 118. Preferred sets of such oligonucleotides or modified oligonucleotides of length X are those consecutively overlapping sets of oligomers corresponding to SEQ ID NO: 1 to SEQ ID NO: 964 (and to the complements thereof). Preferably, said oligomers comprise at least one CpG, TpG or CpA dinucleotide.
Particularly preferred oligonucleotides or oligomers according to the present invention are those in which the cytosine of the CpG dinucleotide (or of the corresponding converted TpG or CpA dinculeotide) sequences is within the middle third of the oligonucleotide; that is, where the oligonucleotide is, for example, 13 bases in length, the CpG, TpG or CpA dinucleotide is positioned within the fifth to ninth nucleotide from the 5′-end.
The oligonucleotides of the invention can also be modified by chemically linking the oligonucleotide to one or more moieties or conjugates to enhance the activity, stability or detection of the oligonucleotide. Such moieties or conjugates include chromophores, fluorophors, lipids such as cholesterol, cholic acid, thioether, aliphatic chains, phospholipids, polyamines, polyethylene glycol (PEG), palmityl moieties, and others as disclosed in, for example, U.S. Pat. Nos. 5,514,758, 5,565,552, 5,567,810, 5,574,142, 5,585,481, 5,587,371, 5,597,696 and 5,958,773. The probes may also exist in the form of a PNA (peptide nucleic acid) which has particularly preferred pairing properties. Thus, the oligonucleotide may include other appended groups such as peptides, and may include hybridization-triggered cleavage agents (Krol et al., BioTechniques 6:958-976, 1988) or intercalating agents (Zon, Pharm. Res. 5:539-549, 1988). To this end, the oligonucleotide may be conjugated to another molecule, e.g., a chromophore, fluorophore, peptide, hybridization-triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc.
The oligonucleotide may also comprise at least one art-recognized modified sugar and/or base moiety, or may comprise a modified backbone or non-natural internucleoside linkage. The oligonucleotides or oligomers according to particular embodiments of the present invention are typically used in ‘sets,’ which contain at least one oligomer for analysis of each of the CpG dinucleotides of genomic sequences SEQ ID NO: 1 to SEQ ID NO: 118 and sequences complementary thereto, or to the corresponding CpG, TpG or CpA dinucleotide within a sequence of the treated nucleic acids according to SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto. However, it is anticipated that for economic or other factors it may be preferable to analyze a limited selection of the CpG dinucleotides within said sequences, and the content of the set of oligonucleotides is altered accordingly.
Therefore, in particular embodiments, the present invention provides a set of at least two (2) (oligonucleotides and/or PNA-oligomers) useful for detecting the cytosine methylation state in treated genomic DNA (SEQ ID NO: 493 to SEQ ID NO: 964), or in genomic DNA (SEQ ID NO: 1 to SEQ ID NO: 118 and sequences complementary thereto). These probes enable diagnosis and/or classification of genetic and epigenetic parameters of breast cell proliferative disorders. The set of oligomers may also be used for detecting single nucleotide polymorphisms (SNPs) in treated genomic DNA (SEQ ID NO: 493 to SEQ ID NO: 964), or in genomic DNA (SEQ ID NO: 1 to SEQ ID NO: 118 and sequences complementary thereto).
In preferred embodiments, at least one, and more preferably all members of a set of oligonucleotides is/are bound to a solid phase.
In further embodiments, the present invention provides a set of at least two (2) oligonucleotides that are used as ‘primer’ oligonucleotides for amplifying DNA sequences of one of SEQ ID NO: 1-118 and SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto, or segments thereof.
It is anticipated that the oligonucleotides may constitute all or part of an “array” or “DNA chip” (i.e., an arrangement of different oligonucleotides and/or PNA-oligomers bound to a solid phase). Such an array of different oligonucleotide- and/or PNA-oligomer sequences can be characterized, for example, in that it is arranged on the solid phase in the form of a rectangular or hexagonal lattice. The solid-phase surface may be composed of silicon, glass, polystyrene, aluminum, steel, iron, copper, nickel, silver, or gold. Nitrocellulose as well as plastics such as nylon, which can exist in the form of pellets or also as resin matrices, may also be used. An overview of the Prior Art in oligomer array manufacturing can be gathered from a special edition of Nature Genetics (Nature Genetics Supplement, Volume 21, January 1999, and from the literature cited therein). Fluorescently labeled probes are often used for the scanning of immobilized DNA arrays. The simple attachment of Cy3 and Cy5 dyes to the 5′-OH of the specific probe are particularly suitable for fluorescence labels. The detection of the fluorescence of the hybridized probes may be carried out, for example, via a confocal microscope. Cy3 and Cy5 dyes, besides many others, are commercially available.
It is also anticipated that the oligonucleotides, or particular sequences thereof, may constitute all or part of an “virtual array” wherein the oligonucleotides, or particular sequences thereof, are used, for example, as ‘specifiers’ as part of, or in combination with a diverse population of unique labeled probes to analyze a complex mixture of analytes. Such a method, for example is described in US 2003/0013091 (U.S. Ser. No. 09/898,743, published 16 Jan. 2003). In such methods, enough labels are generated so that each nucleic acid in the complex mixture (i.e., each analyte) can be uniquely bound by a unique label and thus detected (each label is directly counted, resulting in a digital read-out of each molecular species in the mixture).
It is particularly preferred that the oligomers according to the invention are utilized for at least one of: detection of; detection and differentiation between or among subclasses of; diagnosis of; prognosis of; treatment of; monitoring of; and treatment and monitoring of breast cell proliferative disorders. This is enabled by use of said sets for the differentiation and/or detection of the tissue types according to Table 14. Particularly preferred are those sets of oligomer that comprise at least two oligonucleotides selected from one of the following sets of oligonucleotides.
In one embodiment of the method, breast cancer tissue is detected. This is achieved by analysis of the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting of APC, ARH1/NOEY2, BRCA2, CCND2, CDKN1A, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, MLH1, PGR, SERPINB5, RARB, SFN, SOD2, TGFBR2, THRB, TIMP3, TP73, NME1, CDH13, THBS1, TMS1/ASC, ESR1, IL6, APAF1, CASP8, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, RARA, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, BRCA1, LOT1, PRSS8, SNCG, TPM1, GPC3, CLDN7, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NRSA2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, RAP2B, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 45, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 and complements thereof.
In a further embodiment the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting of PRDM2, PLAU, GSTP1, SLIT2, CCND2, HOXA5, RASSF1A, HS3ST2, ARH1/NOEY2, SCGB3A1, LIMK-1, SEQ ID NO: 6, SEQ ID NO: 3, SEQ ID NO: 18, SEQ ID NO: 7, SEQ ID NO: 41, SEQ ID NO: 22, SEQ ID NO: 46, SEQ ID NO: 13, and SEQ ID NO: 31 and complements thereof.
In both cases this is preferably achieved by use of a set consisting of at least one oligonucleotide, and more preferably at least two selected from one of the groups consisting of 1475-1477, 1477, 1478, 1478, 1479, 1479, 1480, 1480-1497, 1497, 1498, 1498, 1499, 1499, 1500, 1500, 1501, 1501, 1502, 1502-1511, 1511, 1512, 1512-1527, 1527, 1528, 1528-1557, 1557, 1558, 1558-1567, 1567, 1568, 1568, 1569, 1569, 1570, 1570-1573, 1573 , 1574, 1574-1607, 1607, 1608, 1608-1619, 1619, 1620, 1620-1623, 1623, 1624, 1624-1639, 1639, 1640, 1640-1655, 1655, 1656, 1656-1693, 1693, 1694, 1694, 1695, 1695, 1696, 1696, 1697, 1697-1702, 1702, 1703, 1703-1706, 1706, 1706, 1706, 1707-1720, 1720, 1721, 1721-1724, 1724, 1725, 1725-1728, 1728, 1729, 1729-1734, 1734, 1735, 1735-1738, 1738, 1739, 1739-1752, 1752, 1753, 1753-1758, 1758, 1759, 1759′-1762, 1762, 1763, 1763-1770, 1770, 1771, 1771-1778, 1778, 1779, 1779, 1780, 1780, 1781, 1781-1790, 1790, 1791, 1791-1798, 1798, 1799, 1799-1838, 1838, 1839, 1839, 1840, 1840, 1841, 1841-1852, 1852, 1853, 1853, 1854, 1854, 1855, 1855-1870, 1870, 1871, 1871-1874, 1874, 1875, 1875-1878, 1878, 1879, 1879, 1880, 1880, 1881, 1881, 1882, 1882, 1883, 1883-1886, 1886, 1887, 1887-1890, 1890, 1891, 1891-1906, 1906, 1907, 1907-1940, 1940, 1941, 1941, 1942, 1942, 1943, 1943-1946, 1946, 1947, 1947-1952, 1952, 1953, 1953-1966, 1966, 1967, 1967, 1968, 1968, 1969, 1969, 1970, 1970, 1971, 1971-1978, 1978, 1979, 1979-1982, 1982, 1983, 1983-1992, 1992, 1993, 1993-1998, 1998, 1999, 1999-2006, 2006, 2007, 2007-2016, 2016, 2017, 2017-2022, 2022, 2023, 2023-2030, 2030, 2031, 2031-2036, 2036, 2037, 2037-2050, 2050, 2051, 2051-2060, 2060, 2061, 2061-2070, 2070, 2071, 2071-2076, 2076, 2077, 2077-2104, 2104, 2105, 2105, 2106, 2106, 2107, 2107-2114, 2114, 2115, 2115-2128, 2128, 2129, 2129, 2130, 2130, 2131, 2131-2134, 2134, 2135, 2135, 2136, 2136, 2137, 2137, 2138, 2138, 2139, and 2139.
In one embodiment of the method, breast cancer tissue is differentiated from other cancers. This is achieved by analysis of the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting of APC, ARH1/NOEY2, BRCA2, CCND2, CDKN1A, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, MLH1, PGR, SERPINB5, RARB, SFN, SOD2, TGFBR2, THRB, TIMP3, TP73, NME1, CDH13, THBS1, TMS1/ASC, ESR1, IL6, APAF1, CASP8, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, RARA, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, BRCA1, LOT1, PRSS8, SNCG, TPM1, GPC3, CLDN7, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, RAP2B, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 45, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 and complements thereof. In a further embodiment the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting PRDM2, GSTP1, ALX4, HOXA5, PLAU, RASSF1A, IGSF4, SLIT2, DAPK1, CDKN1A, SEQ ID NO: 38, SEQ ID NO: 35, LIMK-1, SEQ ID NO: 39, SEQ ID NO: 10, SEQ ID NO: 26, SEQ ID NO: 8, SEQ ID NO: 22, SEQ ID NO: 18, and SEQ ID NO: 47 and complements thereof.
In both cases this is preferably achieved by use of a set consisting of at least one oligonucleotide, and more preferably at least two selected from one of the groups consisting of 1475, 1476, 1485-1497, 1497, 1498, 1498, 1499, 1499, 1500, 1500, 1501, 1501, 1502, 1502-1504, 1509-1511, 1511, 1512, 1512-1522, 1525-1527, 1527, 1528, 1528-1540, 1551-1554, 1569, 1569, 1570, 1570-1573, 1573, 1574, 1574%-1580, 1585, 1586, 1591-1600, 1603, 1604, 1609-1619, 1619, 1620, 1620-1623, 1623, 1624, 1624-1626, 1629, 1630, 1633-1639, 1639, 1640, 1640-1642, 1649, 1650, 1667, 1668, 1675-1680, 1683-1688, 1710-1717, 1724, 1724, 1725, 1725-1728, 1728, 1729, 1729-1731, 1736-1738, 1738, 1739, 1739, 1748, 1749, 1752, 1752, 1753, 1753, 1760, 1761, 1764, 1765, 1774-1778, 1778, 1779, 1779, 1780, 1780, 1781, 1781-1783, 1788, 1789, 1792-1798, 1798, 1799, 1799-1801, 1804-1819, 1830, 1831, 1858-1865, 1870, 1870, 1871, 1871, 1874, 1874, 1875, 1875-1878, 1878, 1879, 1879, 1880, 1880, 1881, 1881, 1882, 1882, 1883, 1883, 1900, 1901, 1904-1906, 1906, 1907, 1907-1909, 1916, 1917, 1924-1931, 1942, 1942, 1943, 1943, 1948, 1949, 1952, 1952, 1953, 1953-1957, 1966, 1966, 1967, 1967, 1968, 1968, 1969, 1969, 1970, 1970, 1971, 1971, 1980-1982, 1982, 1983, 1983-1985, 1994-1998, 1998, 1999, 1999, 2004, 2005, 2010-2015, 2020, 2021, 2028-2030, 2030, 2031, 2031-2033, 2036, 2036, 2037, 2037-2049, 2054-2059, 2066-2070, 2070, 2071, 2071-2073, 2076, 2076, 2077, 2077, 2080-2089, 2092-2099, 2104, 2104, 2105, 2105, 2108, 2109, 2116-2119, 2130, 2130, 2131, 2131-2134, 2134, 2135, 2135, 2136, 2136, 2137, 2137, 2138, 2138, 2139, 2139, 2140, 2140, 2141, 2141-2144, 2144, 2145, 2145, 2146, 2146-2149, 2149, 2150, 2150-2154, 2163, 2164, 2169, 2170, and 2177-2180.
In one further embodiment of the method, breast cancer cells are detected against a background of blood or blood derived fluids by the differentiation of breast cancer cells from peripheral blood lymphocytes. This is achieved by analysis of the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting APC, ARH1/NOEY2, CCND2, CDH1, CDKN1A, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, PGR, SERPINB5, RARB, SFN, SOD2, TERT, TGFBR2, THRB, TIMP3, NME1, CDH13, THBS1, TMS1/ASC, ESR1, IL6, APAF1, CASP8, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, BRCA1, LOT1, PRSS8, SNCG, GPC3, CLDN7, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, RAP2B, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, SEQ ID NO: 36, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 and complements thereof.
In a further embodiment the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting FABP3, RASSF1A, MSF, PRDM6, LMX1A, SEQ ID NO: 4, SCGB3A1, SLIT2, NR2E1, EYA4, PRDM2, SERPINB5, TWIST, STAT1, ALX4, IGFBP7, DAPK1, THBS1, PLAU, SEQ ID NO: 20, SEQ ID NO: 38, SEQ ID NO: 8, SEQ ID NO: 37, SEQ ID NO: 10, SEQ ID NO: 35, SEQ ID NO: 47, SEQ ID NO: 6, LIMK-1 and SEQ ID NO: 46 and complements thereof.
In both cases, this is preferably achieved by use of a set consisting of at least one oligonucleotide, and more preferably at least two selected from one of the groups consisting of 1475-1477, 1477, 1478, 1478, 1479, 1479, 1480, 1480, 1485-1497, 1497, 1498, 1498, 1501, 1501, 1502, 1502-1511, 1511, 1512, 1512-1522, 1525-1527, 1527, 1528, 1528-1540, 1543-1546, 1549-1557, 1557, 1558, 1558-1567, 1567, 1568, 1568, 1569, 1569, 1570, 1570-1572, 1583-1596, 1599, 1600, 1603, 1604, 1607, 1607, 1608, 1608-1619, 1619, 1620, 1620-1623, 1623, 1624, 1624-1626, 1631-1639, 1639, 1640, 1640, 1645, 1646, 1649-1655, 1655, 1656, 1656-1660, 1663-1668, 1671-1693, 1693, 1694, 1694, 1695, 1695, 1696, 1696, 1697, 1697-1699, 1704-1706, 1706, 1707, 1707-1720, 1720, 1721, 1721, 1724, 1724, 1725, 1725-1728, 1728, 1729, 1729-1733, 1738, 1738, 1739, 1739, 1742, 1743, 1746, 1747, 1750, 1751, 1758, 1758, 1759, 1759-1762, 1762, 1763, 1763-1770, 1770, 1771, 1771-1778, 1778, 1779, 1779, 1782-1789, 1792-1798, 1798, 1799, 1799-1838, 1838, 1839, 1839, 1840, 1840, 1841, 1841-1845, 1848-1852, 1852, 1853, 1853, 1854, 1854, 1855, 1855-1869, 1872, 1873, 1878, 1878, 1879, 1879, 1880, 1880, 1881, 1881, 1882, 1882, 1883, 1883-1885, 1888-1890; 1890, 1891, 1891-1906, 1906, 1907, 1907-1935, 1938-1940, 1940, 1941, 1941, 1942, 1942, 1943, 1943-1946, 1946, 1947, 1947-1952, 1952, 1953, 1953-1966, 1966, 1967, 1967, 1968, 1968, 1969, 1969, 1970, 1970, 1971, 1971-1973, 1976-1978, 1978, 1979, 1979-1982, 1982, 1983, 1983-1987, 1990-1992, 1992, 1993, 1993-1998, 1998, 1999, 1999-2006, 2006, 2007, 2007-2016, 2016, 2017, 2017-2022, 2022, 2023, 2023-2025, 2028-2030, 2030, 2031, 2031-2036, 2036, 2037, 2037-2049, 2052-2060, 2060, 2061, 2061-2070, 2070, 2071, 2071-2076, 2076, 2077, 2077-2104, 2104, 2105, 2105, 2106, 2106, 2107, 2107-2114, 2114, 2115, 2115-2117, 2120-2128, 2128, 2129, 2129, 2130, 2130, 2131, 2131-2134, 2134, 2135, 2135, 2136, 2136, 2137, 2137, 2138, 2138, 2139, 2139, 2140, 2140, 2141, 2141-2143, 2147-2149, 2149, 2150, 2150-2156, 2161, 2162, 2171, 2172, 2181, 2181, 2182, 2182-2187, 2187, 2188, 2188-2199, 2199, 2200, and 2200-2218.
In one embodiment of the method, DCIS is differentiated from benign breast cell proliferative disorders. This is achieved by analysis of the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting of APC, ARH11/NOEY2, CCND2, CDKN1A, CDKN2A, SEQ ID NO: 9, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, MLH1, PGR, SERPINB5, RARB; SOD2, TERT, TGFBR2, THRB, TIMP3, TP73, CDH13, THBS1, TMS1/ASC, ESR1, APAF1, CASP8, SYK, HOXA5, FABP3, RASSF1A, SEQ ID NO: 3, RARA, TWIST, ESR2, PLAU, SEQ ID NO: 4, SNCG, SLC19A1, GJB2, SLIT2, IGSF4, MCT1, HS3ST2, PRDM2, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HB1F) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, SASH1, S100A7, BCL 11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, NR2E1, PCDH7, DKK3, RTTN, SNAP25, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 36, LMX1A, SENP3, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 and complements thereof.
In a further embodiment the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting of HS3ST2, SLIT2, RASSF1A, GSTP1, GJB2, IGFBP7, CDH13, ARH1/NOEY2, SCGB3A1, FHIT, SEQ ID NO: 27, LIMK-1, SEQ ID NO: 46, SEQ ID NO: 3, SEQ ID NO: 117, SEQ ID NO: 48, SEQ ID NO: 41, SEQ ID NO: 4, SEQ ID NO: 6, and SEQ ID NO: 24 and complements thereof.
In both cases this is preferably achieved by use of a set consisting of at least one oligonucleotide, and more preferably at least two selected from one of the groups consisting of 1475-1477, 1477, 1478, 1478, 1479, 1479, 1480, 1480-1482, 1485-1497, 1497, 1498, 1498, 1501, 1501, 1502, 1502-1508, 1513, 1514, 1517, 1518, 1521-1527, 1527, 1528, 1528-1534, 1541-1544, 1547-1550, 1555-1557, 1557, 1558, 1558-1560, 1567, 1567, 1568, 1568, 1569, 1569, 1570, 1570-1573, 1573, 1574, 1574-1578, 1581-1592, 1595-1600, 1605-1607, 1607, 1608, 1608-1618, 1623, 1623, 1624, 1624-1626, 1631-1636, 1639, 1639, 1640, 1640, 1663-1668, 1671-1676, 1689, 1690, 1695, 1695, 1696, 1696, 1697, 1697-1701, 1706, 1706, 1707, 1707-1713, 1720, 1720, 1721, 1721, 1726-1728, 1728, 1729, 1729, 1732-1734, 1734, 1735, 1735, 1740, 1741, 1746-1752, 1752, 1753, 1753, 1756-1758, 1758, 1759, 1759, 1772-1778, 1778, 1779, 1779, 1780, 1780, 1781, 1781, 1784-1790, 1790, 1791, 1791-1798, 1798, 1799, 1799-1805, 1812-1835, 1842-1845, 1848, 1849, 1852, 1852, 1853, 1853, 1854, 1854, 1855, 1855-1869, 1872, 1873, 1876-1878, 1878, 1879, 1879, 1882, 1882, 1883, 1883-1886, 1886, 1887, 1887-1890, 1890, 1891, 1891-1895, 1902-1906, 1906, 1907, 1907, 1910, 1911, 1914-1923, 1932-1935, 1938-1940, 1940, 1941, 1941, 1952, 1952, 1953, 1953, 1958-1963, 1966, 1966, 1967, 1967, 1968, 1968, 1969, 1969, 1970, 1970, 1971, 1971, 1974-1977, 1980-1982, 1982, 1983, 1983-1987, 1992, 1992, 1993, 1993, 1996-1998, 1998, 1999, 1999-2005, 2010-2013, 2028-2030, 2030, 2031, 2031-2036, 2036, 2037, 2037-2039, 2042-2050, 2050, 2051, 2051, 2060, 2060, 2061, 2061-2070, 2070, 2071, 2071-2076, 2076, 2077, 2077, 2092, 2093, 2104, 2104, 2105, 2105, 2106, 2106, 2107, 2107, 2110, 2111, 2116, 2117, 2120-2127, 2130, 2130, 2131, 2131-2134, 2134, 2135, 2135, 2136, 2136, 2137, 2137, 2138, 2138, 2139, 2139, 2155, 2156, 2161, 2162, 2183, 2184, 2187, 2187, 2188, 2188, 2199, 2199, 2200, 2200-2202, and 2219-2222.
In one embodiment of the method, breast cancer is differentiated from benign breast cell proliferative disorders. This is achieved by analysis of the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting of ARH1/NOEY2, CCND2, CDKN1A, CDKN2A, SEQ ID NO: 9, DAPK1, SEQ ID NO: 2, EYA4, FHIT, GSTP1, HIC1, IGFBP7, SERPINB5, TERT, TGFBR2, THRB, TIMP3, TP73, NME1, CDH13, THBS1, TMS1/ASC, IL6, APAF1, SYK, HOXA5, FABP3, RASSFIA, SEQ ID NO: 3, TWIST, ESR2, PLAU, STAT1, SEQ ID NO: 4, LOT1, GPC3, CLDN7, GJB2, SLIT2, IGSF4, MCT1, PRDM2, ALX4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SCGB3A1, SEQ ID NO: 1, PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 RECEPTOR) (PGE RECEPTOR, EP4 SUBTYPE), ORPHAN NUCLEAR RECEPTOR NR5A2 (ALPHA-1-FETOPROTEIN TRANSCRIPTION FACTOR) (HEPATOCYTIC TRANSCRIPTION FACTOR) (B1-BINDING FACTOR) (HBIF) (CYP7A PROMOTER BINDING FACTOR), LIM DOMAIN KINASE 1, BCL 11B, SEQ ID NO: 51, MGC34831, SEQ ID NO: 54, PDLIM1, MSF, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, PRDM6, NR2E1, PCDH7, DKK3, RTTN, SNAP25, SEQ ID NO: 26, GIRK2, SEQ ID NO: 28, SEQ ID NO: 29, ARL7, SEQ ID NO: 31, THH, HOXB13, SEQ ID NO: 35, SEQ ID NO: 36, MGC10561, LMX1A, SENP3, GS1, TITF1, SEQ ID NO: 42, DDX51, SEQ ID NO: 117, SEQ ID NO: 45, SEQ ID NO: 46, O60279, and SEQ ID NO: 48 and complements thereof.
In a further embodiment the CpG methylation status of at least one target sequence comprising, or hybridizing under stringent conditions to at least 16 contiguous nucleotides of a gene or sequence selected from the group consisting SLIT2, HS3ST2, HOXA5, ARH1/NOEY2, IGFBP7, PLAU, CDH13, TIMP3, CCND2, GSTP1, SEQ ID NO: 117, LIMK-1, SEQ ID NO: 46, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 3, SEQ ID NO: 41, SEQ ID NO: 27, SEQ ID NO: 31, and SEQ ID NO: 4 and complements thereof.
In both cases this is preferably achieved by use of a set consisting of at least one oligonucleotide, and more preferably at least two selected from one of the groups consisting of 1475, 1476, 1485-1497, 1497, 1498, 1498, 1499, 1499, 1500, 1500, 1501, 1501, 1502, 1502-1504, 1509-1511, 1511, 1512, 1512-1522, 1525-1527, 1527, 1528, 1528-1540, 1551-1554, 1569, 1569, 1570, 1570-1573, 1573, 1574, 1574-1580, 1585, 1586, 1591-1600, 1603, 1604, 1609-1619, 1619, 1620, 1620-1623, 1623, 1624, 1624-1626, 1629, 1630, 1633-1639, 1639, 1640, 1640-1642, 1649, 1650, 1667, 1668, 1675-1680, 1683-1688, 1710-1717, 1724, 1724, 1725, 1725-1728, 1728, 1729, 1729-1731, 1736-1738, 1738, 1739, 1739, 1748, 1749, 1752, 1752, 1753, 1753, 1760, 1761, 1764, 1765, 1774-1778, 1778, 1779, 1779, 1780, 1780, 1781, 1781-1783, 1788, 1789, 1792-1798, 1798, 1799, 1799-1835, 1842-1852, 1852, 1853, 1853, 1854, 1854, 1855, 1855-1870, 1870, 1871, 1871-1873, 1876-1878, 1878, 1879, 1879, 1880, 1880, 1881, 1881, 1882, 1882, 1883, 1883-1886, 1886, 1887, 1887-1890, 1890, 1891, 1891-1895, 1898, 1899, 1902-1906, 1906, 1907, 1907-1925, 1932-1940, 1940, 1941, 1941, 1942, 1942, 1943, 1943, 1958-1966, 1966, 1967, 1967, 1968, 1968, 1969, 1969, 1970, 1970, 1971, 1971-1977, 1980-1982, 1982, 1983, 1983-1987, 1992, 1992, 1993, 1993, 1998, 1998, 1999, 1999-2006, 2006, 2007, 2007, 2010-2015, 2020-2022, 2022, 2023, 2023, 2028-2030, 2030, 2031, 2031-2036, 2036, 2037, 2037-2045, 2048-2050, 2050, 2051, 2051-2057, 2060, 2060, 2061, 2061-2070, 2070, 2071, 2071-2076, 2076, 2077, 2077-2079, 2082, 2083, 2086, 2087, 2092-2101, 2104, 2104, 2105, 2105, 2106, 2106, 2107, 2107-2111, 2116, 2117, 2120-2128, 2128, 2129, 2129, 2130, 2130, 2131, 2131-2134, 2134, 2135, 2135, 2136, 2136, 2137, 2137, 2138, 2138, 2139, 2139, 2140, 2140, 2141, 2141-2144, 2144, 2145, 2145, 2146, 2146-2149, 2149, 2150, 2150-2167, 2167, 2168, 2168-2175, 2175, 2176, and 2176.
The present invention further provides a method for ascertaining genetic and/or epigenetic parameters of the genomic sequences according to SEQ ID NO: 1 to SEQ ID NO: 118 within a subject by analyzing cytosine methylation and single nucleotide polymorphisms. Said method comprising contacting a nucleic acid comprising one or more of SEQ ID NO: 1 to SEQ ID NO: 118 in a biological sample obtained from said subject with at least one reagent or a series of reagents, wherein said reagent or series of reagents, distinguishes between methylated and non-methylated CpG dinucleotides within the target nucleic acid.
Preferably, said method comprises the following steps: In thefirst step, a sample of the tissue to be analyzed is obtained. The source may be any suitable source, such as cell lines, histological slides, biopsies, paraffin-embedded tissue, body fluids, blood plasma, blood serum, whole blood, isolated blood cells, cells isolated from the blood and all possible combinations thereof. It is preferred that said sources of DNA are body fluids selected from the group consisting nipple aspirate fluid, lymphatic fluid, ductal lavage fluid, fine needle aspirate, blood plasma, blood serum, whole blood, isolated blood cells, cells isolated from the blood.
The genomic DNA is then isolated from the sample. Genomic DNA may be isolated by any means standard in the art, including the use of commercially available kits. Briefly, wherein the DNA of interest is encapsulated in by a cellular membrane the biological sample must be disrupted and lysed by enzymatic, chemical or mechanical means. The DNA solution may then be cleared of proteins and other contaminants e.g. by digestion with proteinase K. The genomic DNA is then recovered from the solution. This may be carried out by means of a variety of methods including salting out, organic extraction or binding of the DNA to a solid phase support. The choice of method will be affected by several factors including time, expense and required quantity of DNA. Once the nucleic acids have been extracted, the genomic double stranded DNA is used in the analysis.
In the second step of the method, the genomic DNA sample is treated in such a manner that cytosine bases which are unmethylated at the 5′-position are converted to uracil, thymine, or another base which is dissimilar to cytosine in terms of hybridization behavior. This will be understood as ‘pretreatment’ or ‘treatment’ herein.
This is preferably achieved by means of treatment with a bisulfite reagent. The term “bisulfite reagent” refers to a reagent comprising bisulfite, disulfite, hydrogen sulfite or combinations thereof, useful as disclosed herein to distinguish between methylated and unmethylated CpG dinucleotide sequences. Methods of said treatment are known in the art (e.g. PCT/EP2004/011715, which is incorporated by reference in its entirety). It is preferred that the bisulfite treatment is conducted in the presence of denaturing solvents such as but not limited to n-alkylenglycol, particulary diethylene glycol dimethyl ether (DME), or in the presence of dioxane or dioxane derivatives. In a preferred embodiment the denaturing solvents are used in concentrations between 1% and 35% (v/v). It is also preferred that the bisulfite reaction is carried out in the presence of scavengers such as but not limited to chromane derivatives, e.g., 6-hydroxy-2,5,7,8-tetramethylchromane 2-carboxylic acid (see: PCT/EP2004/011715 which is incorporated by reference in its entirety). The bisulfite conversion is preferably carried out at a reaction temperature between 30° C. and 70° C., whereby the temperature is increased to over 85° C. for short periods of times during the reaction (see: PCT/EP2004/011715 which is incorporated by reference in its entirety). The bisulfite treated DNA is preferably purified prior to further analysis. This may be conducted by any means known in the art, such as but not limited to ultrafiltration, preferably carried out by means of Microcon™ columns (manufactured by Millipore™). The purification is carried out according to a modified manufacturer's protocol (see: PCT/EP2004/011715 which is incorporated by reference in its entirety).
In the third step of the method, at least one target sequence of the treated DNA is amplified, using at least one pair of primer oligonucleotides according to the present invention, and an amplification enzyme. The amplification of several DNA segments can be carried out simultaneously in one and the same reaction vessel. Typically, the amplification is carried out using a polymerase chain reaction (PCR). The set of primer oligonucleotides includes at least two oligonucleotides whose sequences are reverse complementary, identical, or hybridize under stringent or highly stringent conditions to an at least 16-base-pair long segment of a base sequence selected from the group consisting of SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto. In alternative embodiments of the method the amplification may be carried out by means of methylation specific primers and/or in the presence of blocker oligonucleotides as will be discussed herein.
In a first alternate embodiment of the method, the methylation status of pre-selected CpG positions within the nucleic acid sequences comprising one or more of SEQ ID NO: 1 to SEQ ID NO: 118 may be determined by use of methylation-specific primer oligonucleotides. This technique (MSP) has been described in U.S. Pat. No. 6,265,171 to Herman. The use of methylation status specific primers for the amplification of bisulfite treated DNA allows the differentiation between methylated and unmethylated nucleic acids. MSP primers pairs contain at least one primer which hybridizes to a bisulfite treated CpG dinucleotide. Therefore, the sequence of said primers comprises at least one CpG dinucleotide. MSP primers specific for non-methylated DNA contain a “T” at the position of the C position in the CpG. Preferably, therefore, the base sequence of said primers is required to comprise a sequence having a length of at least 9 nucleotides which hybridizes to a treated nucleic acid sequence according to one of SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto, wherein the base sequence of said oligomers comprises at least one CpG dinucleotide.
A further preferred embodiment of the method comprises the use of blocker oligonucleotides. The use of such blocker oligonucleotides has been described by Yu et al., BioTechniques 23:714-720, 1997. Blocking probe oligonucleotides are hybridized to the bisulfite treated nucleic acid concurrently with the PCR primers. PCR amplification of the nucleic acid is terminated at the 5′ position of the blocking probe, such that amplification of a nucleic acid is suppressed where the complementary sequence to the blocking probe is present. The probes may be designed to hybridize to the bisulfite treated nucleic acid in a methylation status specific manner. For example, for detection of methylated nucleic acids within a population of umethylated nucleic acids, suppression of the amplification of nucleic acids which are unmethylated at the position in question would be carried out by the use of blocking probes comprising a ‘CpA’ or ‘TpA’ at the position in question, as opposed to a ‘CpG’ if the suppression of amplification of methylated nucleic acids is desired.
For PCR methods using blocker oligonucleotides, efficient disruption of polymerase-mediated amplification requires that blocker oligonucleotides not be elongated by the polymerase. Preferably, this is achieved through the use of blockers that are 3′-deoxyoligonucleotides, or oligonucleotides derivitized at the 3′ position with other than a “free” hydroxyl group. For example, 3′-O-acetyl oligonucleotides are representative of a preferred class of blocker molecule.
Additionally, polymerase-mediated decomposition of the blocker oligonucleotides should be precluded. Preferably, such preclusion comprises either use of a polymerase lacking 5′-3′ exonuclease activity, or use of modified blocker oligonucleotides having, for example, thioate bridges at the 5′-termini thereof that render the blocker molecule nuclease-resistant. Particular applications may not require such 5′ modifications of the blocker. For example, if the blocker- and primer-binding sites overlap, thereby precluding binding of the primer (e.g., with excess blocker), degradation of the blocker oligonucleotide will be substantially precluded. This is because the polymerase will not extend the primer toward, and through (in the 5′-3′ direction) the blocker—a process that normally results in degradation of the hybridized blocker oligonucleotide.
A particularly preferred blocker/PCR embodiment, for purposes of the present invention and as implemented herein, comprises the use of peptide nucleic acid (PNA) oligomers as blocking oligonucleotides. Such PNA blocker oligomers are ideally suited, because they are neither decomposed nor extended by the polymerase.
Preferably, therefore, the base sequence of said blocking oligonucleotides is required to comprise a sequence having a length of at least 9 nucleotides which hybridizes to a treated nucleic acid sequence according to one of SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto, wherein the base sequence of said oligonucleotides comprises at least one CpG, TpG or CpA dinucleotide.
The fragments obtained by means of the amplification can carry a directly or indirectly detectable label. Preferred are labels in the form of fluorescence labels, radionuclides, or detachable molecule fragments having a typical mass which can be detected in a mass spectrometer. Where said labels are mass labels, it is preferred that the labeled amplificates have a single positive or negative net charge, allowing for better detectability in the mass spectrometer. The detection may be carried out and visualized by means of, e.g., matrix assisted laser desorption/ionization mass spectrometry (MALDI) or using electron spray mass spectrometry (ESI).
Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-TOF) is a very efficient development for the analysis of biomolecules (Karas and Hillenkamp, Anal Chem., 60:2299-301, 1988). An analyte is embedded in a light-absorbing matrix. The matrix is evaporated by a short laser pulse thus transporting the analyte molecule into the vapor phase in an unfragmented manner. The analyte is ionized by collisions with matrix molecules. An applied voltage accelerates the ions into a field-free flight tube. Due to their different masses, the ions are accelerated at different rates. Smaller ions reach the detector sooner than bigger ones. MALDI-TOF spectrometry is well suited to the analysis of peptides and proteins. The analysis of nucleic acids is somewhat more difficult (Gut and Beck, Current Innovations and Future Trends, 1:147-57, 1995). The sensitivity with respect to nucleic acid analysis is approximately 100-times less than for peptides, and decreases disproportionally with increasing fragment size. Moreover, for nucleic acids having a multiply negatively charged backbone, the ionization process via the matrix is considerably less efficient. In MALDI-TOF spectrometry, the selection of the matrix plays an eminently important role. For desorption of peptides, several very efficient matrixes have been found which produce a very fine crystallization. There are now several responsive matrixes for DNA, however, the difference in sensitivity between peptides and nucleic acids has not been reduced. This difference in sensitivity can be reduced, however, by chemically modifying the DNA in such a manner that it becomes more similar to a peptide. For example, phosphorothioate nucleic acids, in which the usual phosphates of the backbone are substituted with thiophosphates, can be converted into a charge-neutral DNA using simple alkylation chemistry (Gut and Beck, Nucleic Acids Res. 23: 1367-73, 1995). The coupling of a charge tag to this modified DNA results in an increase in MALDI-TOF sensitivity to the same level as that found for peptides. A further advantage of charge tagging is the increased stability of the analysis against impurities, which makes the detection of unmodified substrates considerably more difficult.
In the fourth step of the method, the amplificates obtained during the third step of the method are analyzed in order to determine the methylation status of the CpG dinucleotides prior to the treatment.
In embodiments where the amplificates were obtained by means of MSP amplification, the presence or absence of an amplificate is in itself indicative of the methylation state of the CpG positions covered by the primer, according to the base sequences of said primer.
Similarly in embodiments where the amplificates were obtained by means of amplification in the presence of blocker oligonucleotides the presence or absence of an amplificate is in itself indicative of the methylation state of the CpG positions covered by the blocker, according to the base sequences of said blocker.
In embodiments where the amplificates were obtained by means of MSP amplification in the presence of blocker oligonucleotides the presence or absence of an amplificate is in itself indicative of the methylation state of the CpG positions covered by the primer and blocker oligonucleotides, according to the base sequences of said primer and blocker oligonucleotides.
Amplificates obtained by means of both standard and methylation specific PCR may be further analyzed by means of hybridization-based methods such as, but not limited to, array technology and probe based technologies as well as by means of techniques such as sequencing and template directed extension.
In one embodiment of the method, the amplificates synthesized in step three are subsequently hybridized to an array or a set of oligonucleotides and/or PNA probes. In this context, the hybridization takes place in the following manner: the set of probes used during the hybridization is preferably composed of at least 2 oligonucleotides or PNA-oligomers; in the process, the amplificates serve as probes which hybridize to oligonucleotides previously bonded to a solid phase; the non-hybridized fragments are subsequently removed; said oligonucleotides contain at least one base sequence having a length of at least 9 nucleotides which is reverse complementary or identical to a segment of the base sequences specified in the present Sequence Listing; and the segment comprises at least one CpG, TpG or CpA dinucleotide.
In a preferred embodiment, said dinucleotide is present in the central third of the oligomer. For example, wherein the oligomer comprises one CpG dinucleotide, said dinucleotide is preferably the fifth to ninth nucleotide from the 5′-end of a 13-mer. One oligonucleotide exists for the analysis of each CpG dinucleotide within a sequence selected from the group according to SEQ ID NO: 1 to SEQ ID NO: 118, and the equivalent positions within SEQ ID NO: 493 to SEQ ID NO: 964. Said oligonucleotides may also be present in the form of peptide nucleic acids. The non-hybridized amplificates are then removed. The hybridized amplificates are then detected. In this context, it is preferred that labels attached to the amplificates are identifiable at each position of the solid phase at which an oligonucleotide sequence is located.
In yet a further embodiment of the method, the genomic methylation status of the CpG positions may be ascertained by means of oligonucleotide probes that are hybridized to the bisulfite treated DNA concurrently with the PCR amplification primers (wherein said primers may either be methylation specific or standard).
A particularly preferred embodiment of this method is the use of fluorescence-based Real Time Quantitative PCR (Heid et al., Genome Res. 6:986-994, 1996; also see U.S. Pat. No. 6,331,393) employing a dual-labeled fluorescent oligonucleotide probe (TaqMan™ PCR, using an ABI Prism 7700 Sequence Detection System, Perkin Elmer Applied Biosystems, Foster City, Calif.). The TaqMan™ PCR reaction employs the use of a non-extendible interrogating oligonucleotide, called a TaqMan™ probe, which, in preferred embodiments, is designed to hybridize to a GpC-rich sequence located between the forward and reverse amplification primers. The TaqMan™ probe further comprises a fluorescent “reporter moiety” and a “quencher moiety” covalently bound to linker moieties (e.g., phosphoramidites) attached to the nucleotides of the TaqMan™ oligonucleotide. For analysis of methylation within nucleic acids subsequent to bisulfite treatment, it is required that the probe be methylation specific, as described in U.S. Pat. No. 6,331,393, (hereby incorporated by reference in its entirety) also known as the MethylLight™ assay. Variations on the TaqMan™ detection methodology that are also suitable for use with the described invention include the use of dual-probe technology (Lightcycler™) or fluorescent amplification primers (Sunrise™ technology). Both these techniques may be adapted in a manner suitable for use with bisulfite treated DNA, and moreover for methylation analysis within CpG dinucleotides.
A further suitable method for the use of probe oligonucleotides for the assessment of methylation by analysis of bisulfite treated nucleic acids In a further preferred embodiment of the method, the fifth step of the method comprises the use of template-directed oligonucleotide extension, such as MS-SNuPE as described by Gonzalgo and Jones, Nucleic Acids Res. 25:2529-2531, 1997.
In yet a further embodiment of the method, the fifth step of the method comprises sequencing and subsequent sequence analysis of the amplificate generated in the third step of the method (Sanger F., et al., Proc Natl Acad Sci USA 74:5463-5467, 1977).
In the most preferred embodiment of the method the genomic nucleic acids are isolated and treated according to the first three steps of the method outlined above, namely:
a) obtaining, from a subject, a biological sample having subject genomic DNA;
b) extracting or otherwise isolating the genomic DNA;
c) treating the genomic DNA of b), or a fragment thereof, with one or more reagents to convert cytosine bases that are unmethylated in the 5-position thereof to uracil or to another base that is detectably dissimilar to cytosine in terms of hybridization properties; and wherein
d) amplifying subsequent to treatment in c) is carried out in a methylation specific manner, namely by use of methylation specific primers and/or blocking oligonucleotides, and further, wherein
e) detecting of the amplificates is carried out by means of a real-time detection probe, as described above.
Preferably, where the subsequent amplification of d) is carried out by means of methylation specific primers, as described above, said methylation specific primers comprise a sequence having a length of at least 9 nucleotides which hybridizes to a treated nucleic acid sequence according to one of SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto, wherein the base sequence of said oligomers comprises at least one CpG dinucleotide.
In an alternative, and most preferred embodiment of the method, the subsequent amplification of d) is carried out in the presence of blocking oligonucleotides, as described above. Said blocking oligonucleotides comprising a sequence having a length of at least 9 nucleotides which hybridizes to a treated nucleic acid sequence according to one of SEQ ID NO: 493 to SEQ ID NO: 964 and sequences complementary thereto, wherein the base sequence of said oligomers comprises at least one CpG, TpG or CpA dinucleotide. Step e) of the method, namely the detection of the specific amplificates indicative of the methylation status of one or more CpG positions according to SEQ ID NO: 1 to SEQ ID NO: 118 is carried out by means of real-time detection methods as described above.
Additional embodiments of the invention provide a method for the analysis of the methylation status of genomic DNA according to the invention (SEQ ID NO: 1 to SEQ ID NO: 118, and complements thereof) without the need for pretreatment.
In the first step of such additional embodiments, the genomic DNA sample is isolated from tissue or cellular sources. Preferably, such sources include cell lines, histological slides, body fluids, or tissue embedded in paraffin. In the second step, the genomic DNA is extracted. Extraction may be by means that are standard to one skilled in the art, including but not limited to the use of detergent lysates, sonification and vortexing with glass beads. Once the nucleic acids have been extracted, the genomic double-stranded DNA is used in the analysis.
In a preferred embodiment, the DNA may be cleaved prior to the treatment, and this may be by any means standard in the state of the art, in particular with methylation-sensitive restriction endonucleases.
In the second step, the DNA is then digested with one or more methylation sensitive restriction enzymes. The digestion is carried out such that hydrolysis of the DNA at the restriction site is informative of the methylation status of a specific CpG dinucleotide.
In the third step, which is optional but a preferred embodiment, the restriction fragments are amplified. This is preferably carried out using a polymerase chain reaction, and said amplificates may carry suitable detectable labels as discussed above, namely fluorophore labels, radionuclides and mass labels.
In the fourth step the amplificates are detected. The detection may be by any means standard in the art, for example, but not limited to, gel electrophoresis analysis, hybridization analysis, incorporation of detectable tags within the PCR products, DNA array analysis, MALDI or ESI analysis.
Subsequent to the determination of the methylation state of the genomic nucleic acids the presence, absence or subclass of breast cell proliferative disorder is deduced based upon the methylation state of at least one CpG dinucleotide sequence of SEQ ID NO: 1 to SEQ ID NO: 118, or an average, or a value reflecting an average methylation state of a plurality of CpG dinucleotide sequences of SEQ ID NO: 1 to SEQ ID NO: 118. Methylation of CpG positions within any of the genes of Table 3, with the exception of PRDM2 and S100A7, is indicative of the presence of breast cell proliferative disorders. No methylation of CpG positions within the genes PRDM2 and S100A7 is indicative of the presence of breast cell proliferative disorders.
In alternative embodiments (e.g. Real-time analysis) of the method it is possible to quantify the level of methylation at the analyzed CpG positions (or an average thereof or a value reflecting an average thereof), in such embodiments it is preferred that a pre-determined cut-off point is determined above which measured methylation levels are determined as “methylated” and below which measured methylation levels are determined as “unmethylated”. Particularly preferred is a cut-off between 0% and 10% methylation or equivalent values, also preferred is a cut-off point between 3% and 7% methylation or equivalent values and further preferred is a cut-off point between 4% and 6% methylation or equivalent values.
Moreover, an additional aspect of the present invention is a kit comprising, for example: a bisulfite-containing reagent; a set of primer oligonucleotides containing at least two oligonucleotides whose sequences in each case correspond, are complementary, or hybridize under stringent or highly stringent conditions to a 16-base long segment of the sequences SEQ ID NO: 1 to SEQ ID NO: 964 (most preferably SEQ ID NO: 493 to SEQ ID NO: 964); oligonucleotides and/or PNA-oligomers; as well as instructions for carrying out and evaluating the described method. In a further preferred embodiment, said kit may further comprise standard reagents for performing a CpG position-specific methylation analysis, wherein said analysis comprises one or more of the following techniques: MS-SNuPE, MSP, MethyLight™, HeavyMethyl™, COBRA, and nucleic acid sequencing. However, a kit along the lines of the present invention can also contain only part of the aforementioned components.
In a further embodiment the present invention provides for molecular genetic markers selected from the group consisting APC, BCL11B, CASP8, CDKN2A, DAPK1, DDX51, DKK3, ESR1, ESR2, FABP3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 36, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 51, SEQ ID NO: 117, GIRK2, GJB2, GS1, HS3ST2, MCT1, MGC34831, MLH1, NME1, ORPHAN NUCLEAR RECEPTOR NR5A2, PGR, PRDM6, RARA, RARB, S100A7, SASH1, SEQ ID NO:42, SERPINB5, SLC19A1, SNCG, SOD2, TERT, TGFBR2 and TP73 according to Table 26 that have novel utility for the analysis of methylation patterns associated with the development of cancer, in particular lung, breast and colon cancer. Said markers may be used for detecting cancer, in particular lung, breast and colon cancer and thereby providing improved means for the detection and early treatment of said disorders. As used herein the term cancer shall be understood as a generic term for all classes of malignant neoplasms characterized by uncontrolled cell proliferation and capable of invading other tissues by direct growth into adjacent tissue (invasion) and/or by migration of cells to distant sites (metastasis).
The use of said genes and/or sequences may be enabled by means of any analysis of the expression of the gene, by means of mRNA expression analysis or protein expression analysis. However, in the most preferred embodiment of the invention, the detection of breast cell proliferative disorders is enabled by means of analysis of the methylation status of said genes or genomic sequences and their promoter or regulatory elements. Methods for the methylation analysis of genes are described herein.
It is further preferred that the sequences of said genes in Table 26 according to SEQ ID NO: 65, 50, 71, 57, 98, 43, 24, 75, 91, 77, 4, 7, 9, 14, 16, 17, 19, 28, 29, 36, 46, 48, 51, 117, 27, 111, 40, 113, 101, 52, 89, 107, 11, 83, 20, 108, 88, 96, 102, 42, 68, 116, 73, 105, 92, 93 and 86 as described in the accompanying sequence listing are analyzed.
In a preferred embodiment the invention provides a method for detecting cancer in a subject. Said method comprises the following steps
i) contacting genomic DNA obtained from the subject with at least one reagent, or series of reagents that distinguishes between methylated and non-methylated CpG dinucleotides within at least one target region of the genomic DNA, wherein said contiguous nucleotides comprise at least one CpG dinucleotide sequence, and
ii) detecting, or detecting and distinguishing between or among breast cell proliferative disorders.
It is particularly preferred that said genomic DNA is obtained from isolated from body fluids of the subject.
Genomic DNA may be isolated by any means standard in the art, including the use of commercially available kits. Briefly, wherein the DNA of interest is encapsulated in by a cellular membrane the biological sample must be disrupted and lysed by enzymatic, chemical or mechanical means. The DNA solution may then be cleared of proteins and other contaminants e.g. by digestion with proteinase K. The genomic DNA is then recovered from the solution. This may be carried out by means of a variety of methods including salting out, organic extraction or binding of the DNA to a solid phase support. The choice of method will be affected by several factors including time, expense and required quantity of DNA. Body fluids are the preferred source of the DNA; particularly preferred are blood plasma, blood serum, whole blood, isolated blood cells and cells isolated from the blood.
The genomic DNA sample is then treated in such a manner that cytosine bases which are umethylated at the 5′-position are converted to uracil, thymine, or another base which is dissimilar to cytosine in terms of hybridization behavior. This will be understood as ‘treatment’ herein.
This is preferably achieved by means of treatment with a bisulfite reagent. The term “bisulfite reagent” refers to a reagent comprising bisulfite, disulfite, hydrogen sulfite or combinations thereof, useful as disclosed herein to distinguish between methylated and umethylated CpG dinucleotide sequences. Methods of said treatment are known in the art (e.g. PCT/EP2004/011715, which is incorporated by reference in its entirety). It is preferred that the bisulfite treatment is conducted in the presence of denaturing solvents such as but not limited to n-alkylenglycol, particularly diethylene glycol dimethyl ether (DME), or in the presence of dioxane or dioxane derivatives. In a preferred embodiment the denaturing solvents are used in concentrations between 1% and 35% (v/v). It is also preferred that the bisulfite reaction is carried out in the presence of scavengers such as but not limited to chromane derivatives, e.g., 6-hydroxy-2,5,7,8-tetramethylchromane 2-carboxylic acid (see: PCT/EP2004/011715 which is incorporated by reference in its entirety). The bisulfite conversion is preferably carried out at a reaction temperature between 30° C. and 70° C., whereby the temperature is increased to over 85° C. for short periods of times during the reaction (see: PCT/EP2004/011715 which is incorporated by reference in its entirety). The bisulfite treated DNA is preferably purified prior to further analysis. This may be conducted by any means known in the art, such as but not limited to ultrafiltration, preferably carried out by means of Microcon™ columns (manufactured by Millipore™). The purification is carried out according to a modified manufacturer's protocol (see: PCT/EP2004/011715 which is incorporated by reference in its entirety).
The treated DNA is then analyzed in order to determine the methylation state of one or more target gene sequences (prior to the treatment) associated with the development of cancer. It is particularly preferred that the target region comprises, or hybridizes under stringent conditions to at least 16 contiguous nucleotides of at least one gene or genomic sequence selected from the group consisting the genes and genomic sequences as listed in Table 26. It is further preferred that the sequences of said genes in Table 26 according to SEQ ID NO: 65, 621, 622, 857, 858, 50, 591, 592, 827, 828, 71, 633, 634, 869, 870, 57, 605, 606, 841, 842, 98, 687, 688, 923, 924, 43, 577, 578, 813, 814, 24, 539, 540, 775, 776, 75, 641, 642, 877, 878, 91, 673, 674, 909, 910, 77, 645, 646, 881, 882, 4, 499, 500, 735, 736, 7, 505, 506, 741, 742, 9, 509, 510, 745, 746, 14, 519, 520, 755, 756, 16, 523, 524, 759, 760, 17, 525, 526, 761, 762, 19, 529, 530, 765, 766, 28, 547, 548, 783, 784, 29, 549, 550, 785, 786, 36, 563, 564, 799, 800, 46, 583, 584, 819, 820, 48, 587, 588, 823, 824, 51, 593, 594, 829, 830, 117, 725, 726, 961, 962, 27, 545, 546, 781, 782, 111, 713, 714, 949, 950, 40, 571, 572, 807, 808, 113, 717, 718, 953, 954, 101, 693, 694, 929, 930, 52, 595, 596, 831, 832, 89, 669, 670, 905, 906, 107, 705, 706, 941, 942, 11, 513, 514, 749, 750, 83, 657, 658, 893, 894, 20, 531, 532, 767, 768, 108, 707, 708, 943, 944, 88, 667, 668, 903, 904, 96, 683, 684, 919, 920, 102, 695, 696, 931, 932, 42, 575, 576, 811, 812, 68, 627, 628, 863, 864, 116, 723, 724, 959, 960, 73, 637, 638, 873, 874, 105, 701, 702, 937, 938, 92, 675, 676, 911, 912, 93, 677, 678, 913, 914, 86, 663, 664, 899 and 900 as described in the accompanying sequence listing are analyzed. The method of analysis may be selected from those known in the art, including those listed herein. Particularly preferred are MethyLight™, MSP and the use of blocking oligonucleotides as previously described herein. It is further preferred that any oligonucleotides used in such analysis (including primers, blocking oligonucleotides and detection probes) should be reverse complementary, identical, or hybridize under stringent or highly stringent conditions to an at least 16-base-pair long segment of the base sequences of one or more of the converted sequences selected from the group consisting 65, 621, 622, 857, 858, 50, 591, 592, 827, 828, 71, 633, 634, 869, 870, 57, 605, 606, 841, 842, 98, 687, 688, 923, 924, 43, 577, 578, 813, 814, 24, 539, 540, 775, 776, 75, 641, 642, 877, 878, 91, 673, 674, 909, 910, 77, 645, 646, 881, 882, 4, 499, 500, 735, 736, 7, 505, 506, 741, 742, 9, 509, 510, 745, 746, 14, 519, 520, 755, 756, 16, 523, 524, 759, 760, 17, 525, 526, 761, 762, 19, 529, 530, 765, 766, 28, 547, 548, 783, 784, 29, 549, 550, 785, 786, 36, 563, 564, 799, 800, 46, 583, 584, 819, 820, 48, 587, 588, 823, 824, 51, 593, 594, 829, 830, 117, 725, 726, 961, 962, 27, 545, 546, 781, 782, 111, 713, 714, 949, 950, 40, 571, 572, 807, 808, 113, 717, 718, 953, 954, 101, 693, 694, 929, 930, 52, 595, 596, 831, 832, 89, 669, 670, 905, 906, 107, 705, 706, 941, 942, 11, 513, 514, 749, 750, 83, 657, 658, 893, 894, 20, 531, 532, 767, 768, 108, 707, 708, 943, 944, 88, 667, 668, 903, 904, 96, 683, 684, 919, 920, 102, 695, 696, 931, 932, 42, 575, 576, 811, 812, 68, 627, 628, 863, 864, 116, 723, 724, 959, 960, 73, 637, 638, 873, 874, 105, 701, 702, 937, 938, 92, 675, 676, 911, 912, 93, 677, 678, 913, 914, 86, 663, 664, 899 and 900 according to Table 26 and sequences complementary thereto.
It is particularly preferred that the CpG methylation analysis of the genes according to Table 26 are carried out in the form of a gene panel wherein said gene panel consists of one or more markers selected from Table 26 and one or more tissue specific methylation markers. The term tissue specific methylation marker shall be taken to mean a DNA sequence comprising at least one CpG position, the methylation status thereof being a unique characteristic of a specific tissue type thereby enabling the identification of biological matter of said tissue origin. It is further preferred that said gene panel consists of one or more markers selected from Table 26 and one or more tissue specific methylation markers selected from the group consisting of tissue specific markers for each of breast, colon and lung. In an alternative embodiment said gene panel consists of one or more markers selected from Table 26 and one or more tissue specific methylation markers selected from the group consisting of tissue specific markers for each of bladder, breast, colon, lung, rectal, pancreatic, endometrium, prostate, kidney and skin tissues.
It will be appreciated by one skilled in the art that the nucleic acid, oligonucleotides and kits as described herein for the detection and/or characterization will have an additional utility for the detection of all cancers, more preferably breast, lung and colon.
More specifically a preferred embodiment of the invention comprises the use of an oligonucleotide or oligomer for detecting the cytosine methylation state within genomic or treated (chemically modified) DNA, according to Table 26. Said oligonucleotide or oligomer comprising a nucleic acid sequence having a length of at least nine (9) nucleotides which hybridizes, under moderately stringent or stringent conditions (as defined herein above), to a treated nucleic acid sequence according to Table 26 and/or sequences complementary thereto, or to a genomic sequence according to Table 26 and/or sequences complementary thereto.
Furthermore, the present invention also provides kits for the detection of cancer comprising, for example: a bisulfite-containing reagent; a set of primer oligonucleotides containing at least two oligonucleotides whose sequences in each case correspond, are complementary, or hybridize under stringent or highly stringent conditions to a 16-base long segment of the sequences according to Table 26 (most preferably the converted sequences as shown in Table 26); oligonucleotides and/or PNA-oligomers; as well as instructions for carrying out and evaluating the described method. In a further preferred embodiment, said kit may further comprise standard reagents for performing a CpG position-specific methylation analysis, wherein said analysis comprises one or more of the following techniques: MS-SNuPE, MSP, MethyLight, HeavyMethyl, COBRA, and nucleic acid sequencing. However, a kit along the lines of the present invention can also contain only part of the aforementioned components.
While the present invention has been described with specificity in accordance with certain of its preferred embodiments, the following examples serve only to illustrate the invention and are not intended to limit the invention within the principles and scope of the broadest interpretations and equivalent configurations thereof.
To evaluate marker candidates a significant number of patient and control samples was analyzed using the applicant's proprietary methylation sensitive Microarray technology. For the Microarray study two gene panels were analyzed on a collection of 475 samples.
An overview of patient samples collected for the microarray study is provided in Table 25.
Early stage breast cancer samples were obtained from Erasmus Medical Centre, Rotterdam, The Netherlands. The tumor cell proportion was estimated to be on average 60%, ranging from 30% to 90%. Infiltrating ductal carcinomas (IDC) represented the largest histological subtype. Among the IDC patient samples, estrogen receptor (ER) positive and negative, pre- and post menopausal as well as aggressive (node positive) and less aggressive (node negative) tumors were included. In addition, infiltrating lobular carcinomas (ILC) and ductal carcinomas in-situ (DCIS) were analyzed. For all DCIS samples, the diagnosis was confirmed by pathology review and estimates of tumor cell content were provided. Finally, tumors with confirmed BRCA1 mutations were analyzed to assess whether genetic breast cancer samples, which represent about 5-10% of all breast cancer incidences, would show different DNA methylation patterns.
Normal breast samples were obtained primarily from breast reductions or from patient samples that were originally diagnosed with DCIS but were classified to be normal during a pathology review. To analyze DNA methylation patterns in breast epithelial cells, from which breast cancer is almost exclusively derived, epithelial cells were sorted using epithelial cell surface markers. In addition, samples with the diagnosis fibroadenoma, fibrocystic disease and atypical ductal hyperplasia were included into the group benign breast conditions.
To evaluate the marker performance on other cancer samples, tumor DNA from several other cancer types was analyzed. Emphasis was put on the most frequently occurring cancer classes in women (lung, colon) and tumors of the female reproductive system (endometrium, ovary).
Age matched female lymphocyte samples were used to assess the methylation level of candidate markers in blood cells.
For control purposes additional samples were included in both Microarray studies. In order to control the quality and the functionality of detection oligos, artificially up- and downmethylated DNA (Promega) was used. 8 male lymphocyte samples were included to allow for a positive control of the overall Microarray process by comparing differential methylation between male and female lymphocytes samples.
An initial selection of 63 candidate genes or sequences were identified. In addition, one gene fragment, ELK1, which was known to differentiate DNA of male from female origin, was included as a positive control. The gene panel for the second microarray study consisted of 59 sequences initially identified using AP-PCR, MCA or DMH and confirmed by sequencing and the ELK1 gene as a positive control. It has to be noted that not all sequences of the second panel currently map to known genes. Candidate markers from both chips are fully listed in Table 3.
Samples were received from external collaborators or commercial providers either as frozen tissues, cell nuclei pellets, or extracted genomic DNA. DNA from tissue samples and cell nuclei pellets was isolated at using the QiaAmp Mini Kit (Qiagen, Hilden, Germany; No: 51306).
The DNA quality of all delivered and extracted samples was first assessed by photometrical measurements. Extinction at 260 nm and 280 nm was measured, A260/280 ratios were determined and the resulting DNA concentration were calculated.
After photometrical measurements each genomic DNA sample was analyzed by gel electrophoresis to assess the integrity of the DNA. Only minor signs of degradation were observed, indicating good overall quality of the extracted DNA.
To amplify all gene fragments, PCR assays were designed to match bisulfite treated DNA and to allow amplification independent of the methylation status of the respective fragment. A standardized primer design workflow optimized by the applicant for bisulfite treated DNA was employed. Individual PCR assays were considered established when successful amplification on bisulfite treated lymphocyte DNA was reproduced in triplicate and no background amplification of genomic DNA was detectable, ensuring bisulfite DNA specific amplification. Primers are listed in Table 1.
To allow efficient amplification, individual PCR assays were combined into multiplex PCR (MPCR) assays usually combining up to 8 primer pairs into one mPCR assay. Several multiplex PCR sets were calculated based on the primer sequences of the individual PCR amplificates and tested on lymphocyte DNA. Based on ALF express analyses the best performing combination of multiplex PCR sets were chosen.
Total genomic DNA of all samples and controls was bisulfite treated converting unmethylated cytosines to uracil. Methylated cytosines are conserved. Bisulfite treatment was performed according to the applicant's optimized proprietary bisulfite treatment procedure. In order to avoid a potential process bias, the samples were randomized into processing batches. The patient samples were first grouped according to the major diagnosis classes:
Batches of 50 samples were created. Each batch contained the same proportion of samples from the major diagnosis classes. Two independent bisulfite reactions were performed for each DNA sample. 10 ng of bisulfite treated DNA was used for each multiplex PCR (mPCR) reaction. In order to monitor the MPCR results, two methods were used: ALF analysis and gel electrophoresis.
All PCR products from each individual sample were then hybridized to glass slides carrying a pair of immobilized oligonucleotides for each CpG position under analysis. For hybridizations, the samples were grouped into processing batches in order to avoid a potential process-bias. The samples were processed in batches of 80 samples randomized for bisulfite batches. Each detection oligonucleotide was designed to hybridize to the bisulphite converted sequence around one CpG site which was either originally unmethylated (TG) or methylated (CG). See Table 2 for further details of all hybridization oligonucleotides used (both informative and non-informative.) Hybridization conditions were selected to allow the detection of the single nucleotide differences between the TG and CG variants.
Fluorescent signals from each hybridized oligonucleotide were detected using genepix scanner and software. Ratios for the two signals (from the CG oligonucleotide and the TG oligonucleotide used to analyze each CpG position) were calculated based on comparison of intensity of the fluorescent signals.
The samples were processed in batches of 80 samples randomized for sex, diagnosis, tissue, and bisulphite batch For each bisulfite treated DNA sample 2 hybridizations were performed. This means that for each sample a total number of 4 chips were processed.
For the analysis of chip data, Epigenomics' proprietary software (‘Episcape’) was used. EpiScape contains a data warehouse that supports queries to sample, genome and laboratory management databases, respectively. It encompasses a variety of statistical tools for analyzing and visualizing methylation array data. In the following sections we summarize the most important data analysis techniques that were applied for analyzing the data.
The log methylation ratio (log(CG/TG)) at each CpG position was determined according to a standardized preprocessing pipeline that includes the following steps:
This log ratio has the property that the hybridization noise has approximately constant variance over the full range of possible methylation rates (see e.g. Huber W, Von Heydebreck A, Sultmann H, Poustka A, Vingron M. 2002. Variance stabilization applied to Microarray data calibration and to the quantification of differential expression. Bioinformatics. 18 Suppl 1: S96-S104.)
Principle component analysis (PCA) projects measurement vectors (e.g. chip data, methylation profiles on several CpG sites etc.) onto a new coordinate system. The new coordinate axes are referred to as principal components. The first principal component spans the direction of largest variance of the data. Subsequent components are ordered by decreasing variance and are orthogonal to each other. Different CpG positions contribute with different weights to the extension of the data cloud along different components. PCA is an unsupervised technique, i.e. it does not take into account any group or label information of the data points (for further details see e.g. Ripley, B. D. 1996. Pattern Recognition and Neural Networks, Cambridge, UK, Cambridge University Press).
PCA is typically used to project high dimensional data (in our case methylation-array data) onto lower dimensional subspaces in order to visualize or extract features with high variance from the data. In the present report we-used 2 dimensional projections for statistical quality control of the data. We investigated the effect of different process parameters on the chip data in order to rule out that changing process parameters caused large alterations in the measurement values.
A robust version of PCA was used to detect single outlier chips and exclude them from further analysis (Model F, Koenig T, Piepenbrock C, Adoijan P. 2002. Statistical process control for large scale Microarray experiments. Bioinformatics. 18 Suppl 1:S155-163).
To control the general stability of the chip production process we use methods from the field of multivariate statistical process control (MVSPC). Our major tool is the T2 control chart, which is used to detect significant deviations of the chip process from normal working conditions (Model F, Koenig T, Piepenbrock C, Adoijan P. 2002. Statistical process control for large scale Microarray experiments. Bioinformatics. 18 Suppl 1:S155-163).
Use of T2 charts to monitor the chip production process allows us to efficiently detect and eliminate most systematic error sources.
Wilcoxon rank sum tests are used to compare groups (e.g. male vs. female lymphocytes) in terms of measurement values of single CpG sites. A significant test result (p<0.05) indicates a shift between the distributions of the respective methylation log-ratios, i.e. log(CG/TG).
For the comparison of up- vs. down methylated chips hybridized with Promega DNA, Fisher scores are used to rank single CpG sites according to their discriminatory power. For each methylation ratio y=CG/TG, the Fisher score is calculated as
where
As referred to herein a marker (sometimes also simply referred to as gene or amplicon) is a genomic region of interest (also referred to herein using the abbreviation ROI). The ROI usually comprises several CpG positions. For testing the null hypothesis that a marker has no predictive power we use the likelihood ratio test for logistic regression models (see Venables, W. N. and Ripley, B. D. Modern Applied Statistics with S-PLUS, 3 rd Ed. edition. New York: Springer, 2002). The logistic regression model for a single marker is a linear combination of methylation measurements from all CpG positions in the respective ROI. The fitted logistic regression model is compared to a constant probability model that is independent of methylation and represents the null hypothesis. The p-value of the marker is computed via the likelihood ratio test.
A significant p-value for a marker means that the methylation of this ROI has some systematic correlation to the question of interest as given by the sample classes. In general a significant p-value does not necessarily imply a good classification performance. However, because with logistic regression we use a linear predictor as the basis of our test statistic small p-values will be indicative of a good clinical performance.
Performing a large number of tests at the 5% level will lead to a large number of false positive test results. If there are no differences between groups, the probability of rejecting at least one hypothesis of equality is nearly 1, if about 200 tests are performed. Correction for multiplicity is therefore necessary to reliably conclude that a test result is really significant. A conservative, but simple method is the Bonferroni correction which multiplies all p-values by the number of tests performed, where corrected values >1 are censored to 1.0.
Bonferroni corrections are used for all analyses. The correction helps to avoid spurious findings, however, it is a very conservative method and false negative results (“missed markers”) are a frequent consequence. Therefore, results corrected by the less conservative False Discovery Rate (FDR) methods are also given.
In order to give a reliable estimate of how well the CpG ensemble of a selected marker can differentiate between different tissue classes we can determine its prediction accuracy by classification. For that purpose we calculate a methylation profile-based prediction function using a certain set of tissue samples with a specific class label. This step is called training and it exploits the prior knowledge represented by the data labels. The prediction accuracy of that function is then tested on a set of independent samples. As a method of choice, we use the support vector machine (SVM) algorithm (see e.g. Cristiannini, N. and Shawe-Taylor, J. An introduction to support vector machines. Cambridge, UK: Cambridge University Press, 2000; Duda, R. O., Hart, P. E., and Stork, D. G. Pattern Classification. New York: John Wiley & Sons, 2001) to learn the prediction function. For this report, sensitivity and specificity are weighted equally. This is achieved by setting the risk associated with false positive and false negative classifications to be inversely proportional to the respective class sizes. Therefore sensitivity and specificity of the resulting classifier can be expected to be approximately equal. Note that this weighting can be adapted according to the clinical requirements.
With limited sample size the cross-validation method provides an effective and reliable estimate for the prediction accuracy of a discriminator function, and therefore in addition to the significance of the markers we provide cross-validation accuracy, sensitivity and specificity estimates. For each classification task, the samples were partitioned into 5 groups of approximately equal size. Then the learning algorithm was trained on 4 of these 5 sample groups. The predictor obtained by this method was then tested on the remaining group of independent test samples. The number of correct positive and negative classifications was counted over 10 runs for the learning algorithm for all possible choices of the independent test group without using any knowledge obtained from the previous runs. This procedure was repeated on 10 random permutations of the sample set giving a better estimate of the prediction performance than if performed by simply splitting the samples into one training sample set and one independent test set.
The first step in the analysis of the array data was to identify discriminatory markers when comparing breast cancer samples with benign breast conditions. Since the preferred aim of the test is to detect early lesions differentiating between DCIS and benign breast conditions was the primary focus. In order to meet the requirements of a blood based screening test the performance of the marker candidate genes in differentiating breast cancer from lymphocytes and cancer from other origins, respectively was analyzed.
Finally, all breast cancer samples were compared against all other control samples and it was determined that a large number of markers met the specified statistical criteria.
For every comparison, two data analysis methods were used. Multivariate logistic regression analyses were performed and resulting p-values corrected for multiple testing according to the Bonferroni method. Separate models were fitted for each amplificate, comprising methylation ratios measured by 2-6 detection oligo pairs. A marker was preferred, if a p-value below 0.05 after Bonferroni correction for multiple testing was observed. In addition, linear support vector machine (SVM) algorithms were trained and accuracy, sensitivity and specificity to distinguish the respective classes were estimated based on cross validation.
In addition, co-methylation of CpG sites was assessed by analyzing individual detection oligos by means of univariate analyses. Co-methylation was assumed if at least two detection oligos of the same gene fragment differentiated the respective classes with statistical significance (p<0.05 after Bonferroni correction for multiple testing).
Breast Cancer vs. Benign Breast Conditions
In this comparison the positive was 263 breast cancer samples consisting of 197 IDC, 24 ILC, 22 DCIS and 20 BRCA1 samples, respectively. The negative class consisted 28 normal breast samples and 46 breast samples with a benign breast condition, including fibroadenoma, fibrocystic disease and atypical ductal hyperplasia. Based on multivariate logistic regression analysis 49 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 28 markers fulfilled the co-methylation criterion with at least two detection oligos based on univariate analysis.
In this comparison the positive class was 256 breast cancer samples which are consisting of 191 IDC, 22 ILC, 21 DCIS and 22 BRCA1 samples, respectively. The negative consisted 26 normal breast samples and 42 breast samples with a benign breast condition, including fibroadenoma, fibrocystic disease and atypical ductal hyperplasia. Based on multivariate logistic regression 48 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 32 markers fulfilled the co-methylation criterion with at least two detection oligos based on univariate an analysis.
Results are also presented in graphical form in
DCIS vs. Benign Breast Conditions
In this comparison the positive class was 263 breast cancer samples consisting of 197 IDC, 24 ILC, 22 DCIS and 20 BRCA1 samples, respectively. The negative class is composed of 28 normal breast samples and 46 breast samples with a benign breast condition, including fibroadenoma, fibrocystic disease and atypical ductal hyperplasia. Based on multivariate logistic regression analysis 31 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 13 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
In this comparison the positive class was 256 breast cancer samples consisting 191 IDC, 22 ILC, 21 DCIS and 22 BRCA1 samples, respectively. The negative class i consisted of 26 normal breast samples and 42 breast samples with a benign breast condition, including fibroadenoma, fibrocystic disease and atypical ductal hyperplasia. Based on multivariate logistic regression 35 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 17 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
Results are also presented in graphical form in
Breast Cancer vs. Lymphocytes
In this comparison the positive class was 263 breast cancer samples which consisted 197 IDC, 24 ILC, 22 DCIS and 20 BRCA1 samples, respectively. The negative class consisted of 28 lymphocyte samples. Based on multivariate logistic regression analysis 49 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 30 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
In this comparison the positive class was 256 breast cancer samples which consisted of 191 IDC, 22 ILC, 21 DCIS and 22 BRCA1 samples, respectively. The negative class consisted of 34 lymphocyte samples. Based on multivariate logistic regression, 47 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 37 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
Results are also presented in graphical form in
Breast Cancer vs. Other Cancers
In this comparison the positive class was 263 breast cancer samples which consisted of 197 IDC, 24 ILC, 22 DCIS and 20 BRCA1 samples, respectively. The negative class consisted of 71 other cancer samples which is composed of 12 colon samples, 19 lung samples, 16 ovary samples and 28 further samples of smaller tissue groups. Based on multivariate logistic regression analysis 28 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 16 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
In this comparison the positive class was 256 breast cancer samples which consisted of 191 IDC, 22 ILC, 21 DCIS and 22 BRCA1 samples, respectively. The negative class consisted of 73 other cancer samples which consisted of 18 colon samples, 18 lung samples, 16 ovary samples and 21 further samples of smaller tissue groups. Based on multivariate logistic regression 25 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 15 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
Results are also presented in graphical form in
Breast Cancer vs. All Other Controls
In this comparison the positive class was 263 breast cancer samples which consisted of 197 IDC, 24 ILC, 22 DCIS and 20 BRCA1 samples, respectively. The negative class consisted of 74 breast samples (normal and benign disease), 71 other cancer samples and 28 lymphocyte samples. Based on multivariate logistic regression analysis 50 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 29 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
In this comparison the positive class was 256 breast cancer samples which consisted of 191 IDC, 22 ILC, 21 DCIS and 22 BRCA1 samples, respectively. The negative class consisted of 68 breast samples (normal and benign disease), 73 other cancer samples and 34 lymphocyte samples. Based on multivariate logistic regression 48 markers differentiate between breast cancer and benign breast conditions with statistical significance. Of these, 32 markers fulfilled the co-methylation criterion with at least two detection oligos statistically significant based on univariate analysis.
Results are also presented in graphical form in
Markers from both microarrays which were significant in differentiating Breast cancer vs. benign breast conditions but were not significant in differentiating Breast cancer vs. other cancers as listed in Table 27 were determined to have utility as general cancer markers.
In the following example a variety of Real-Time assays were developed for the methylation analysis of:
The assays were designed to be run on the LightCycler platform (Roche Diagnostics), but other such instruments commonly used in the art are also suitable. The assays were MSP and HeavyMethyl assays for the analysis of bisulfite treated DNA.
The MSP assay comprises one pair of methylation specific primers suitable for the amplification of a bisulfite converted target sequence, each primer comprising at least one CpG position. Accordingly, only target DNA which was methylated at the relevant CpG positions (prior to bisulfite treatment) is amplified. The amplificate is then detected by means of a Taqman style fluorescent labelled detection probes.
In the HeavyMethyl assay the target DNA is amplified by means of a pair of primers that are specific for amplification of a bisulfite converted target sequence, wherein said primers do not hybridise to positions that comprised CpG dinucleotides prior to bisulfite treatment. Amplification is carried out in the presence of a blocker oligonucleotide that comprises at least one ApC dinucleotide and that hybridises to bisulfite converted non-methylated CpG positions situated between the two primers. The blocker oligonucleotides are suitably modified to suppress amplification of target sequences by the primers. Accordingly, only target DNA which was methylated at the relevant CpG positions (prior to bisulfite treatment) is amplified. Amplificates are detected by means of Lightcycler style fluorescent labelled dual detection probes.
Primer, probe and where relevant blocker oligonucleotides according to Table 22 were used. The following reagents and reaction temperatures were used:
Reagents:
The assays were tested in two different settings. In a first setting the following breast cancer markers as confirmed according to the above microarray experiment were analysed in both blood and breast cancer samples in order to determine their utility as diagnostic markers suitable for use in a blood based screening test.
Ten whole blood samples and 14 breast cancer samples were analysed. In a second setting the following breast cancer markers as confirmed according to the above microarray experiment were analysed in order to determine whether they were general cancer markers (i.e. would need to be combined with more cancer type specific markers in a blood based screening test) or if they were specifically methylated in breast cancer only (as opposed to other cancers):
All sample were analysed in twenty four breast cancer samples and a “other cancers” sample group consisting of 12 lung cancer and 12 colon cancer samples, with the exception of 17378 wherein data is only presented with respect to a “other cancers” group consisting of the 12 lung cancer samples.
Quantification of the amount of methylated DNA measured by each assay was calculated by comparison of the amplification curve of a test sample to a standard curve. The standard curve was generated according to a dilution series used as a reference.
On the right hand side of each figure is a ROC plot of sensitivity against specificity. The ROC curve is a plot of the true positive rate against the false positive rate for the different possible cutpoints of a diagnostic test. It shows the trade-off between sensitivity and specificity depending on the selected cutpoint (any increase in sensitivity will be accompanied by a decrease in specificity). The area under an ROC curve (AUC) is a measure for the accuracy of a diagnostic test (the larger the area the better, optimum is 1, a random test would have a ROC curve lying on the diagonal with an area of 0.5; for reference: J. P. Egan. Signal Detection Theory and ROC Analysis, Academic Press, New York, 1975).
Each detected other cancer sample was considered a false positive and each undetected breast cancer was considered a false negative.
Refer to Tables 23 and 24 to determine which figure correspond to which assay.
Table 23 shows the AUC and particularly preferred sensitivity and specificities of each assayed gene according to the blood vs. breast cancer comparison.
Table 24 shows the AUC and particularly preferred sensitivity and specificities of each assayed gene according to the other cancer vs. breast cancer comparison.
Number | Date | Country | Kind |
---|---|---|---|
04016926.0 | Jul 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP05/07830 | 7/18/2005 | WO | 00 | 1/19/2007 |