The present invention relates generally to the area of molecular subtyping of cancer to distinguish a subtype this is unlikely to metastasize.
The 5-year survival rate for patients with oral squamous cell carcinoma (SCC), at 40%, is among the worst of all sites in the body and has not improved over the past 40 years. In the United States, more people die from oral cancer than melanoma, cervical cancer, or ovarian cancer. For patients with oral SCC, neck (cervical) metastasis is the primary determinant for prognosis, and once the neck lymph nodes are involved, the survival rate is reduced by one-half. Treatment for oral cancer is primarily surgical. Patients are assessed prior to surgery for lymph node metastasis by palpation of the lymph nodes in the neck and by imaging (CT, MRI, PET scan). If the neck is clinically positive, the treatment decision is straightforward, and the cervical lymph nodes and associated structures are removed during surgical resection of the tumor. Management of patients with clinically negative (N0) necks is less clear, given the unpredictable propensity of oral SCC for occult neck metastasis and the associated grave prognosis. Occult metastatic rates for oral SCC are high and range from 20-45% for T1 tongue SCCs. Treatment options include a “wait and see” approach and elective neck dissection. On the one hand, salvage rates of patients developing neck metastasis following the initial surgery are poor, while on the other hand, elective neck dissection may subject the patient to unnecessary major surgery with its associated risks and morbidity. Currently, tumor thickness is considered the best predictor of metastasis; however, it is difficult to assess this parameter from the incisional biopsy prior to surgery. Thus, the current standard of care is the American Joint Commission on Cancer (AJCC) TNM staging protocol, which is based on the surface diameter of the tumor.
There are currently no reliable molecular biomarkers for discriminating patients with and without oral SCC metastases prior to surgery.
In some embodiments, the invention provides a first method of determining the presence of oral squamous cell carcinoma that is unlikely to metastasize by analyzing a biological sample, e.g., an oral sample, from a subject. In various embodiments, the method entails determining relative copy numbers in sample DNA for the following chromosomal regions: 3q, 8p, 8q, and 20, wherein no gain of chromosomal regions 3q, 8q, and 20, and no loss of chromosomal region 8p is indicative of oral squamous cell carcinoma that is unlikely to metastasize. In various embodiments, the method entails determining relative copy numbers in sample DNA for the following chromosomal regions: 3q, 8p, 8q, and 20, wherein a gain of one or more (e.g., two or three) of chromosomal regions 3q, 8q, and 20, and/or a loss of chromosomal region 8p is indicative of oral squamous cell carcinoma having a substantial likelihood of metastasis. In certain embodiments, the method entails determining relative copy numbers in sample DNA for the following chromosomal regions: 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter, wherein no gain of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and no loss of chromosomal region 8pter-p23.1 is indicative of oral squamous cell carcinoma that is unlikely to metastasize. In certain embodiments, the method entails determining relative copy numbers in sample DNA for the following chromosomal regions: 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter, wherein a gain of one or more (e.g., two or three) of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and/or a loss of chromosomal region 8pter-p23.1 is indicative of oral squamous cell carcinoma having a substantial likelihood of metastasis.
In illustrative embodiments, chromosomal region 3q24-qter extends from SEQ ID NO:1 to the q terminus of chromosome 3, chromosomal region 8pter-p23.1 extends from the p terminus of chromosome 8 to SEQ ID NO:7, and chromosomal region 8q12-q24.2 extends from SEQ ID NO:11 to SEQ ID NO:4.
In illustrative embodiments, the first method is carried out by contacting sample DNA with a combination of probes for chromosomal regions 3q, 8p, 8q, and 20, incubating the probes with the sample under conditions in which each probe binds selectively with a nucleic acid sequence in its target chromosomal region to form a stable hybridization complex, and detecting hybridization of the probes to determine copy number for each chromosomal region. For example, the method can be carried out by hybridization of sample nucleic acids to said combination of probes, which are immobilized on a substrate. In some embodiments, the method is carried out by array comparative genomic hybridization (aCGH). The combination of probes can, in some embodiments, include a plurality of probes for each chromosomal region. In certain embodiments, the combination of probes includes a plurality of probes for each of one or more control chromosomal regions. In another embodiment, the method is carried out by in situ hybridization, and each probe in the probe combination is labeled with a different label. In some embodiments, the probe combination includes at least 4, but not more than about 1012 probes, for example, not more than about 1011 probes, 1010 probes, 109 probes, 108 probes, 107 probes, 106 probes, or 105 probes. In some embodiments, the probe combination includes at least 4, but not more than 10,000 probes. In some embodiments, the probe combination includes at least 4, but not more than 1000 probes. In various embodiments, the probe combination includes at least 4, but not more than 100 probes. In particular embodiments, the probe combination includes at least 4, but not more than 10 probes.
In certain embodiments, the first method entails amplification of target nucleic acids in chromosomal regions 3q, 8p, 8q, and 20, for example, by polymerase chain reaction (PCR) or multiplex ligation-dependent probe amplification (MLPA). In some embodiments, the method includes producing a plurality of amplicons from a plurality of target nucleic acids in each chromosomal region. In certain embodiments, the method includes producing a plurality of amplicons from a plurality of target nucleic acids in each of one or more control chromosomal regions.
In particular embodiments, the first method entails high-throughput DNA sequencing. The method can, in some embodiments, include sequencing a plurality of target nucleic acids in each chromosomal region. In certain embodiments, the method includes sequencing a plurality of target nucleic acids in each of one or more control chromosomal regions.
In some embodiments, the invention provides a second method of determining the presence of oral squamous cell carcinoma that is unlikely to metastasize in an oral sample from a subject. The method entails determining fraction of genome gained, wherein if the fraction of genome gained is below 0.065, the oral squamous cell carcinoma is unlikely to metastasize. In some embodiments, the invention provides a second method of determining the presence of oral squamous cell carcinoma that has a substantial likelihood of metastasis in an oral sample from a subject. The method entails determining fraction of genome gained, wherein if the fraction of genome gained is greater than 0.065, the oral squamous cell carcinoma has a substantial likelihood of metastasis. In embodiments where it is determined that the oral SCC has a substantial likelihood of metastasis, the method can further comprise evaluating a lymph node sample, e.g., from a cervical lymph node. In particular embodiments, the method entails determining relative copy numbers for a plurality of target nucleic acids.
In some embodiments, the invention provides a third method of determining the presence of oral squamous cell carcinoma that is unlikely to metastasize in an oral sample from a subject. The method entails determining fraction of genome altered, wherein if the fraction of genome altered is below 0.095, the oral squamous cell carcinoma is unlikely to metastasize. In some embodiments, the invention provides a third method of determining the presence of oral squamous cell carcinoma that has a substantial likelihood of metastasis in an oral sample from a subject. The method entails determining fraction of genome altered, wherein if the fraction of genome altered is greater than 0.095, the oral squamous cell carcinoma has a substantial likelihood of metastasis. In embodiments where it is determined that the oral SCC has a substantial likelihood of metastasis, the method can further comprise evaluating a lymph node sample, e.g., from a cervical lymph node. In particular embodiments, the method entails determining relative copy numbers for a plurality of target nucleic acids.
The second and third methods can, in certain embodiments, be carried out by hybridization of sample nucleic acids to a combination of probes, which are immobilized on a substrate, e.g., as in array comparative genomic hybridization (aCGH). In particular embodiments, the combination of probes can include a plurality of probes for each of one or more control chromosomal regions.
In certain embodiments, the second and third methods entail amplification of target nucleic acids, for example, by polymerase chain reaction (PCR) or multiplex ligation-dependent probe amplification (MLPA). In some embodiments, the methods include producing a plurality of amplicons from a plurality of target nucleic acids in each of one or more control chromosomal regions.
In particular embodiments, the second and third methods entail high-throughput DNA sequencing. The methods can, in some embodiments, include sequencing a plurality of target nucleic acids in each of one or more control chromosomal regions.
In any of the above-described embodiments, relative copy numbers can be determined by analyzing genomic DNA. In other embodiments, relative copy numbers can be determined by analyzing RNA, cDNA, or DNA amplified from RNA.
Any of the above-described methods can, in certain embodiments, additionally entail querying the copy number(s) of one or more control chromosomal regions.
In various embodiments, where there is an indication that the oral squamous cell carcinoma has a substantial likelihood of metastasis, the method can further comprise determining the presence of one or more genetic alterations selected from the group consisting of: fraction of genome gained (FGG), fraction of genome altered (FGA), altered methylation status, TP53 mutation(s), and the presence of relative copy number alterations at one or more loci other than 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter, wherein the presence of one or more of said genetic alterations indicates an increased likelihood that metastasis will occur or has occurred. In various embodiments, where there is an indication that the oral squamous cell carcinoma has a substantial likelihood of metastasis, the method can further comprise determining one or more clinical parameters selected from the group consisting of tumor size, tumor thickness, tumor stage, the presence of metastasis (e.g., by radiographic imaging, and palpation of the neck).
In any of the above-described embodiments, the biological sample can include an oral sample, a sample of the primary tumor, and a sample at the margin of the tumor. In some embodiments, the biological sample is an oral sample. In any of the above-described embodiments, the oral sample can include saliva, an oral washing sample, an oral swab or brush sample, or an oral tissue sample from a site selected from the group consisting of: tongue, gingiva, floor of mouth, retromolar trigone, buccal mucosa, and lip.
If the results of any of these methods indicate the presence of oral squamous cell carcinoma that is unlikely to metastasize, the method can, in some embodiments, additionally include treating the subject for oral squamous cell carcinoma without removing the cervical lymph nodes. In various embodiments, when the results of the method indicates the presence of oral squamous cell carcinoma having a substantial likelihood of metastasis, the method additionally comprises determining relative copy numbers in sample DNA from one or more cervical lymph nodes for one or more (e.g., two, three or four) of the following chromosomal regions: 3q, 8p, 8q, and 20. In various embodiments, when the results of the method indicates the presence of oral squamous cell carcinoma having a substantial likelihood of metastasis, the method additionally comprises removing one or more cervical lymph nodes from the subject.
In a further aspect, the invention provides a method of assessing the risk, that if an oral epithelial dysplasia progresses, the oral epithelial dysplasia will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis, the method comprising determining relative DNA copy numbers in a biological sample from a subject for the following chromosomal regions: 3q, 8p, 8q, and 20, wherein no gain of chromosomal regions 3q, 8q, and 20, and no loss of chromosomal region 8p is indicative of oral epithelial dysplasia that, if it progresses, is unlikely to progress to metastatic oral squamous cell carcinoma, and wherein a gain of one or more (e.g., two or three) of chromosomal regions 3q, 8q, and 20, and/or a loss of chromosomal region 8p is indicative of oral epithelial dysplasia that, if it progresses, will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis.
In various embodiments, the method comprises determining relative copy numbers in sample DNA for the following chromosomal regions: 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter, wherein no gain of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and no loss of chromosomal region 8pter-p23.1 is indicative of oral epithelial dysplasia that, if it progresses, is unlikely to progress to metastatic oral squamous cell carcinoma, and wherein a gain of one or more (e.g., two or three) chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and/or a loss of chromosomal region 8pter-p23.1 is indicative of oral epithelial dysplasia that, if it progresses, will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis.
In some embodiments, the method comprises additionally monitoring the oral dysplasia for evidence of progression to oral squamous cell carcinoma.
In some embodiments, if the biological sample has a gain or loss of one or more (e.g., two, three or four) of said chromosomal regions, the method further comprises determining the presence of one or more genetic alterations selected from the group consisting of: fraction of genome gained (FGG), fraction of genome altered (FGA), methylation status, TP53 mutation(s), and the presence of relative copy number alterations at one or more loci other than 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter, wherein the presence of one or more of said genetic alterations indicates an increased risk that the oral epithelial dysplasia is progressing, has progressed, or has a substantial likelihood of progressing.
In some embodiments, if the biological sample has a gain or loss of one or more (e.g., two, three or four) of said chromosomal regions, the method further comprises determining the presence of relative copy number alterations at one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or all) loci selected from the group consisting of 3pter-p14.1, 4p15.3-p15.2, 4q33-4-q35, 5pter-p13.2, 5q12-q23, 7p11.2-p12.1, 8p23.3-p21.2, 8p12, 8q11.1-qter, 9pter-p21.1, 11q13-q13.4, 18q22-qter, 20pter-p13, 20p12.2 and 21q21.3, wherein the presence of one or more of said copy number alterations indicates an increased risk that the oral epithelial dysplasia is progressing, has progressed, or has a substantial likelihood of progressing.
In some embodiments, if the biological sample has a gain or loss of one or more (e.g., two, three or four) of said chromosomal regions, the method further comprises treating the oral dysplasia more aggressively than if the results of the method indicated that the oral dysplasia was unlikely to progress to metastatic oral squamous cell carcinoma.
In some embodiments of the method for assessing oral epithelial dysplasia, if the biological sample has a gain or loss of one or more (e.g., two, three or four) of said chromosomal regions, the method further comprises determining one or more clinical parameters selected from the group consisting of dysplasia grade, presence of erythroplakia, toluidine blue staining, presence of ulcer (i.e., ulcerated lesion), and pain.
In some embodiments of the method for assessing oral epithelial dysplasia, chromosomal region:
3q24-qter extends from SEQ ID NO:1 to the q terminus of chromosome 3;
8pter-p23.1 extends from the p terminus of chromosome 8 to SEQ ID NO:7; and
8q12-q24.2 extends from SEQ ID NO:11 to SEQ ID NO:4.
In various embodiments of the method for assessing oral epithelial dysplasia, the relative copy numbers are determined by analyzing genomic DNA. In various embodiments of the method for assessing oral epithelial dysplasia, the relative copy numbers are determined by analyzing RNA, cDNA, or DNA amplified from RNA. In various embodiments, the method for assessing oral epithelial dysplasia additionally comprises querying the copy number(s) of one or more control chromosomal regions.
In various embodiments of the method for assessing oral epithelial dysplasia, the method comprises:
contacting sample DNA with a combination of probes for chromosomal regions 3q, 8p, 8q, and 20;
incubating the probes with the sample under conditions in which each probe binds selectively with a nucleic acid sequence in its target chromosomal region to form a stable hybridization complex; and
detecting hybridization of the probes to determine copy number for each chromosomal region.
In some embodiments of the method for assessing oral epithelial dysplasia, the method is carried out by hybridization of sample nucleic acids to said combination of probes, which are immobilized on a substrate. In various embodiments, this method is carried out by array comparative genomic hybridization (aCGH). In various embodiments of the method for assessing oral epithelial dysplasia, the combination of probes comprises a plurality of probes for each chromosomal region. In various embodiments of this method, the combination of probes comprises a plurality of probes for each of one or more control chromosomal regions. In some embodiments, the probe combination includes at least 4, but not more than about 1012 probes, for example, not more than about 1011 probes, 1010 probes, 109 probes, 108 probes, 107 probes, 106 probes, or 105 probes. In various embodiments of this method, the probe combination comprises at least 4, but not more than 10,000 probes. In various embodiments of this method, the probe combination comprises at least 4, but not more than 1000 probes. In various embodiments of this method, the probe combination comprises at least 4, but not more than 100 probes. In various embodiments of this method, the probe combination comprises at least 4, but not more than 10 probes.
In some embodiments of the method for assessing oral epithelial dysplasia, the method is carried out by in situ hybridization, and each probe in the probe combination is labeled with a different label. In various embodiments of this method, the probe combination comprises at least 4, but not more than 1000 probes. In various embodiments of this method, the probe combination comprises at least 4, but not more than 100 probes. In various embodiments of this method, the probe combination comprises at least 4, but not more than 10 probes.
In some embodiments of the method for assessing oral epithelial dysplasia, the method comprises amplification of target nucleic acids in chromosomal regions 3q, 8p, 8q, and 20. In various embodiments, this method comprises polymerase chain reaction (PCR) or multiplex ligation-dependent probe amplification (MLPA). In various embodiments, this method comprises producing a plurality of amplicons from a plurality of target nucleic acids in each chromosomal region. In various embodiments, this method comprises producing a plurality of amplicons from a plurality of target nucleic acids in each of one or more control chromosomal regions.
In some embodiments of the method for assessing oral epithelial dysplasia, the method comprises high-throughput DNA sequencing. In various embodiments, this method comprises sequencing a plurality of target nucleic acids in each chromosomal region. In various embodiments, this method comprises sequencing a plurality of target nucleic acids in each of one or more control chromosomal regions.
In any of the above-described embodiments of the method for assessing oral epithelial dysplasia, the biological sample can include an oral sample, a sample of the primary dysplasia, and a sample at the margin of the dysplasia. In some embodiments of this method, the biological sample is an oral sample. In some embodiments of this method, the oral sample comprises saliva, an oral washing sample, an oral swab or brush sample, or an oral tissue sample from a site selected from the group consisting of: tongue, gingiva, floor of mouth, retromolar trigone, buccal mucosa, and lip.
In some embodiments, when the results of the method indicate the oral dysplasia is likely to progress to metastatic oral squamous cell carcinoma, the method additionally comprises treating the oral dysplasia more aggressively than if the results of the method indicated that the oral dysplasia was unlikely to progress to metastatic oral squamous cell carcinoma.
In a related aspect, the invention provides a method of determining the presence of metastatic oral squamous cell carcinoma in a lymph node sample from a subject, the method comprising determining relative copy numbers in sample DNA for the following chromosomal regions: 3q, 8p, 8q, and 20, wherein a gain of one or more (e.g., two or three) of chromosomal regions 3q, 8q, and 20, and/or a loss of chromosomal region 8p is indicative of metastatic oral squamous cell carcinoma.
In some embodiments, the method of determining the presence of metastatic oral squamous cell carcinoma comprises determining relative copy numbers in sample DNA for the following chromosomal regions: 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter, wherein a gain of one or more (e.g., two or three) of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and/or a loss of chromosomal region 8pter-p23.1 is indicative of metastatic oral squamous cell carcinoma.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method further comprises determining the presence of one or more genetic alterations selected from the group consisting of: fraction of genome gained (FGG), fraction of genome altered (FGA), methylation status, TP53 mutation(s), and the presence of relative copy number alterations at one or more loci other than 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the chromosomal region:
3q24-qter extends from SEQ ID NO:1 to the q terminus of chromosome 3;
8pter-p23.1 extends from the p terminus of chromosome 8 to SEQ ID NO:7; and
8q12-q24.2 extends from SEQ ID NO:11 to SEQ ID NO:4.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the relative copy numbers are determined by analyzing genomic DNA. In some embodiments, relative copy numbers are determined by analyzing RNA, cDNA, or DNA amplified from RNA.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method additionally comprises querying the copy number(s) of one or more control chromosomal regions.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method comprises:
contacting sample DNA with a combination of probes for chromosomal regions 3q, 8p, 8q, and 20;
incubating the probes with the sample under conditions in which each probe binds selectively with a nucleic acid sequence in its target chromosomal region to form a stable hybridization complex; and
detecting hybridization of the probes to determine copy number for each chromosomal region.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method is carried out by hybridization of sample nucleic acids to said combination of probes, which are immobilized on a substrate. In various embodiments, this method is carried out by array comparative genomic hybridization (aCGH). In various embodiments of this method, the combination of probes comprises a plurality of probes for each chromosomal region. In some embodiments of this method, the combination of probes comprises a plurality of probes for each of one or more control chromosomal regions. In some embodiments, the probe combination includes at least 4, but not more than about 1012 probes, for example, not more than about 1011 probes, 1010 probes, 109 probes, 108 probes, 107 probes, 106 probes, or 105 probes. In some embodiments of this method, the probe combination comprises at least 4, but not more than 10,000 probes. In some embodiments of this method, the probe combination comprises at least 4, but not more than 1000 probes. In some embodiments of this method, the probe combination comprises at least 4, but not more than 100 probes. In some embodiments of this method, the probe combination comprises at least 4, but not more than 10 probes.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method is carried out by in situ hybridization, and each probe in the probe combination is labeled with a different label. In some embodiments of this method, the probe combination comprises at least 4, but not more than 1000 probes. In some embodiments of this method, the probe combination comprises at least 4, but not more than 100 probes. In some embodiments of this method, the probe combination comprises at least 4, but not more than 10 probes.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method comprises amplification of target nucleic acids in chromosomal regions 3q, 8p, 8q, and 20. In some embodiments, this method comprises polymerase chain reaction (PCR) or multiplex ligation-dependent probe amplification (MLPA). In some embodiments, this method comprises producing a plurality of amplicons from a plurality of target nucleic acids in each chromosomal region. In some embodiments, this method comprises producing a plurality of amplicons from a plurality of target nucleic acids in each of one or more control chromosomal regions.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma, the method comprises high-throughput DNA sequencing. In some embodiments, this method comprises sequencing a plurality of target nucleic acids in each chromosomal region. In some embodiments, this method comprises sequencing a plurality of target nucleic acids in each of one or more control chromosomal regions.
In another aspect, the invention provides a method of determining the presence of metastatic oral squamous cell carcinoma in a lymph node sample from a subject, the method comprising determining fraction of genome gained (FGG) and/or the fraction of genome altered (FGA) in the sample.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma based on FGG or FGA, the method entails determining relative copy numbers for a plurality of target nucleic acids. In some embodiments of this method, the relative copy numbers are determined by analyzing genomic DNA. In some embodiments of this method, the relative copy numbers are determined by analyzing RNA, cDNA, or DNA amplified from RNA.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma based on FGG or FGA, the method additionally comprises querying the copy number(s) of one or more control chromosomal regions.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma based on FGG or FGA, the method is carried out by hybridization of sample nucleic acids to a combination of probes, which are immobilized on a substrate. In some embodiments, this method is carried out by array comparative genomic hybridization (aCGH). In some embodiments of this method, the combination of probes comprises a plurality of probes for each of one or more control chromosomal regions.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma based on FGG or FGA, the method comprises amplification of target nucleic acids. In some embodiments, this method comprises polymerase chain reaction (PCR) or multiplex ligation-dependent probe amplification (MLPA). In some embodiments, this method comprises producing a plurality of amplicons from a plurality of target nucleic acids in each of one or more control chromosomal regions.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma based on FGG or FGA, the method comprises high-throughput DNA sequencing. In some embodiments, this method comprises sequencing a plurality of target nucleic acids in each of one or more control chromosomal regions.
In some embodiments of the method of determining the presence of metastatic oral squamous cell carcinoma based on FGG or FGA, when the results of the method indicate the presence of metastatic oral squamous cell carcinoma, e.g., in a fine needle aspirate of a lymph node or a sentinel lymph node biopsy, the method additionally comprises removing one or more cervical lymph nodes from the subject. In cases of evaluating FGG and/or FGA in a lymph node, if the fraction of genome gained is above zero (0) and/or if the fraction of genome altered is above zero (0), metastatic oral squamous cell carcinoma is present in the sample.
Another aspect of the invention is a combination of probes or primers, wherein the probes or primers hybridize or anneal, respectively, to chromosomal regions 3q, 8p, 8q, and 20. The combination of probes or primers is capable of distinguishing samples including oral squamous cell carcinoma that is unlikely to metastasize, e.g., from samples that include oral squamous cell carcinoma that is likely to metastasize and/or that have a substantial likelihood of metastasis. In certain embodiments, the probes or primers hybridize or anneal, respectively, to chromosomal regions 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter. In illustrative embodiments, chromosomal region 3q24-qter extends from SEQ ID NO:1 to the q terminus of chromosome 3, chromosomal region 8pter-p23.1 extends from the p terminus of chromosome 8 to SEQ ID NO:7, and chromosomal region 8q12-q24.2 extends from SEQ ID NO:11 to SEQ ID NO:4. In some embodiments, the combination includes one or more probes or primers that hybridize or anneal, respectively, to one or more control chromosomal regions. In certain embodiments, the combination of probes includes a plurality of probes for each chromosomal region. In variations of such embodiments, the combination of probes can include a plurality of probes for each of one or more control chromosomal regions. In some embodiments, the probe combination includes at least 4, but not more than about 1012 probes, for example, not more than about 1011 probes, 1010 probes, 109 probes, 108 probes, 107 probes, 106 probes, or 105 probes. In illustrative embodiments, the combination includes at least 4, but not more than 10,000 probes or primers. In illustrative embodiments, the combination includes at least 4, but not more than 1000 probes or primers. In illustrative embodiments, the combination includes at least 4, but not more than 100 probes or primers. In some embodiments, the combination includes at least 4, but not more than 10 probes or primers.
The combination of probes or primers can be provided in a kit for distinguishing, identifying and/or diagnosing oral squamous cell carcinoma that is unlikely to metastasize. In some embodiments, the invention provides kits for distinguishing oral squamous cell carcinoma that is unlikely to metastasize from oral squamous cell carcinoma having a substantial likelihood of metastasis, comprising a combination of probes or primers that hybridize or anneal, respectively, to the chromosomal regions 3q, 8p, 8q, and 20. In various embodiments, the probes are immobilized on a substrate or the probes or primers labeled with different labels. In some embodiments, the kit further comprises one or more control probes or primers.
The present invention provides a molecular biomarker for the identification of tumors unlikely to metastasize. Tumor cells from an incisional biopsy or other source such as saliva or brushing of the tumor can be evaluated for the presence/absence of the molecular biomarker prior to surgical resection of the tumor, allowing the surgeon to determine whether the tumor is of the subtype that is unlikely to metastasize. This information can then be used in planning the surgical treatment, e.g., whether an elective neck dissection would be advised for a patient with a clinically N0 neck, i.e., where there is no evidence of regional lymph node involvement.
Oral epithelial dysplasia precedes and unpredictably transforms to oral squamous cell carcinoma (SCC). The present invention is based, in part, on the discovery that DNA copy number aberrations in chromosomal regions +3q24-qter, −8pter-p23.1, +8q12-q24.2 and +20 are early genomic events identifying two subgroups of dysplasia and cancers. One or more (e.g., two, three or four) of these aberrations is present in the major subgroup (termed 3q8pq20 subtype, comprising 70-80% of lesions) that develops with chromosomal instability, while they are absent from the more chromosomally stable non-3q8pq20 subgroup (20-30% of lesions). The 3q8pq20 subtype can be further subdivided according to level of genomic instability. The most chromosomally unstable 3q8pq20 tumors also display differential methylation compared to all other tumors and normal oral tissues. Little difference in methylation was detected when comparing the low instability 3q8pq20 and non-3q8pq20 tumors, suggesting that extensive epigenetic alterations do not contribute to formation of the non-3q8pq20 tumors. The 3q8pq20 and non-3q8pq20 cases, however, differ significantly in clinical outcome with risk for cervical (neck) lymph node metastasis almost exclusively associated with the 3q8pq20 subtype in two independent oral SCC cohorts. Thus, lack of +3q, −8p, +8q and +20 is a biomarker for low risk for oral SCC metastasis that can significantly alter clinical practice by identifying patients who do not require additional surgery to remove the cervical lymph nodes at the time of tumor resection. Moreover, while increased numbers of genomic alterations can be harbingers of progression to cancer, dysplastic lesions lacking copy number changes cannot be considered benign as they are potential precursors to non-3q8pq20 locally invasive, yet not metastatic oral SCC.
In particular, it has been discovered that oral SCC can be subdivided into those that harbor one or more (e.g., two, three or four) of the following: gains of regions on chromosome 3q and/or 8q, and/or loss of a region of 8p, and/or gain of chromosome 20; and those that do not have any of these aberrations. Tumors with one or more (e.g., two, three or four) of these aberrations are termed “3q8pq20,” and those lacking any of these aberrations, “non-3q8pq20.” The non-3q8pq20 group represents the minority of cases (20-30%). Non-3q8pq20 tumors are not associated with metastasis to the lymph nodes of the neck, compared with the 3q8pq20 tumors (p<0.006, Fisher test). This observation provides physicians with the capability to determine which patients require additional extensive surgery to remove the cervical (neck) lymph nodes at the time of the surgery to remove the tumor, and which patients could be spared this additional major surgery.
In addition to predicting substantial risk of metastasis of oral SCC, evaluation of relative copy number at chromosomal regions 3q8pq20 is useful for evaluating margins after tumor removal, for identifying dysplasias that, upon progression, are likely to progress to oral SCC that has a substantial risk of metastasis, for identifying dysplasias that could be monitored for possible progression, and for determining the presence of metastatic oral SCC (e.g., detecting micrometastases) in lymph nodes. With respect to evaluating tumor margins or dysplasias, a determination that a tumor or dysplasia is of the 3q8pq20 positive subtype (i.e., gains of regions on chromosome 3q and/or 8q, and/or loss of a region of 8p, and/or gain of chromosome 20), indicates that the tumor or dysplasia is more likely to have and/or acquire copy number alterations. Accordingly, monitoring margins and/or tumor recurrence and/or dysplasia progression by testing for copy number changes (e.g., by FISH) is useful for these cases.
With respect to evaluation of cancer cells in lymph nodes, there is currently interest in the use of sentinel lymph nodes to identify metastasis. Evaluation of copy number of 3q, 8p, 8q and/or 20 can aid the identification of tumor cells in the lymph nodes. Addition of molecular tests can increase sensitivity to detect micrometastases. Currently, immunohistochemistry for cytokeratins or RT-PCR for specific cancer-associated transcripts is used. Since FISH can be carried out on routinely fixed clinical specimens, there could be advantages over the use of RT-PCR, which requires that a portion of the node be frozen and not fixed. The studies described herein indicate that oral SCC metastases will have one or more (e.g., two, three or four) of the copy number changes, +3q, −8p, +8q and +20. Accordingly, tumor cells metastatic to the lymph node would also have one or more (e.g., two, three or four) of these aberrations. Small numbers of such cells can be identified in the lymph nodes, e.g., by FISH or any other appropriate method, with probes to these regions. Adding FISH to the analysis of the dissected lymph nodes improves the accuracy of the pathological assessment of nodal status.
In certain embodiments, the methods described herein are based, in part, on the identification of chromosomal regions that can be used to subtype oral SCC to determine whether an oral sample contains an SCC subtype that is substantially likely or unlikely to metastasize. The method entails obtaining an oral sample and analyzing it to determine nucleic acid copy number for regions of chromosomes 3q, 8p, 8q, and 20 relative to that for the rest of the genome (i.e., the “relative copy number”). For example, copy numbers for these regions can be compared to copy numbers for one or more other regions of the genome (e.g., one or more selected control regions) and/or compared to the average, median, or other representative copy number characteristic of the genome as a whole to determine copy number differences (i.e., gains or losses). In certain embodiments, copy numbers relative to one or more other regions and/or the average, median, or other representative copy number characteristic of the genome as a whole are determined for the following chromosomal regions: 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter (i.e., the entire chromosome 20). Such comparisons can be carried out within a single cell, within pre-selected cells, or by bulk analysis.
Relative copy number can be determined by any available method, including in situ hybridization, array-based hybridization assays, amplification-based assays, and high-throughput DNA sequencing. In situ hybridization employs probes that reliably provide information on their targets in individual cells or chromosomes. Probes of these types are well known in the art and many are commercially available. The cells and chromosomes may be isolated from tissue or in the original tissue context.
Array-based hybridization and amplification-based assays typically employ nucleic acid extracted from the specimen and thus do not measure the copy number status of chromosomal regions of individual cells, unless only a single cell is subjected to the measurement. In such assays, a plurality of probes can be employed, and/or a plurality of target sequences amplified, across each of the chromosomal regions to obtain a sufficiently accurate representation of the relative copy number for the chromosomal region. When using high-throughput DNA sequencing for relative copy number determinations, it may also be desirable, in some embodiments, to sequence a plurality of sequences within each target chromosomal region. In various embodiments, the number of probes employed, and or target sequences amplified and/or sequenced, to ascertain the relative copy number of a particular chromosomal region is 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000. Additionally, the number of probes employed, and or target sequences amplified and/or sequenced, can fall within any range bounded by any of these values.
In certain embodiments, it is advantageous to make copy number determinations at one or more control chromosomal regions, which are expected less frequently to have an altered copy number (relative to the average, median, or other representative copy number characteristic of the genome as a whole) in oral SCC. Control chromosomal regions include those that have been established by prior genomic studies of oral SCC to have a low frequency of copy number aberrations. In some embodiments, it may be desirable to make copy number determinations for a plurality of sequences within one or more control chromosomal regions. For example, multiple control region sequences can readily be queried in array-based hybridization and amplification assays, as well as determinations employing high-throughput DNA sequencing. In various embodiments, the number of control region sequences queried is 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, or more, as appropriate, for each control region. Additionally, the number of sequences queried can fall within any range bounded by any of these values.
A relative copy number difference, gain or loss, is detected using any technique that is appropriate for the particular analytical method employed. Suitable techniques are well known and can be selected for a particular analytical method by one of skill in the art. Additional techniques may be developed in the future. In embodiments employing a labeled probe and/or primer, a gain can be detected as an elevated signal relative to the rest of the genome, e.g., relative to the signal from one or more control regions or relative to the average signal for the genome. Conversely, a loss can be detected as a reduced signal relative to the rest of the genome, e.g., relative to the signal from one or more control regions or relative to the average signal for the genome. The manner in which a signal from one or more labeled probe(s) and/or primer(s) is quantified will vary depending on the assay method. For example, for in situ hybridization, signal “level” can be determined by counting spots, whereas in other methods signal intensity is measured. The level to which this measured signal is compared can be predetermined or can be determined within the same assay by querying a control region, as discussed above, and/or by measuring signal level across the genome. Those of skill in the art appreciate that measuring signal level “across the genome” need not, and typically does not, entail querying every chromosomal locus, but rather querying a plurality of chromosomal loci, which can, e.g., be spaced across the genome. In some embodiments, the signal obtained from an oral SCC sample can be compared with that from a reference sample, which is typically obtained from non-cancerous tissue, to identify gains and losses in the oral SCC sample relative to the non-cancerous tissue.
Relative copy number can be determined by analyzing genomic DNA. In addition, indirect measurements of relative copy number can be obtained by analyzing RNA or nucleic acids derived from RNA, such as cDNA or DNA amplified from RNA. The relationship between relative copy number and expression levels of genes located in regions showing copy number differences is described, for example, in Pollack et al., Proc. Natl. Acad. Sci., USA 99:12963-68 (2002) (incorporated by reference here in its entirety and specifically for this description), which reports that, on average, a 2-fold change in DNA copy number is associated with a corresponding 1.5-fold change in mRNA levels. See also, Tonan et al. Proc. Natl. Acad. Sci., USA (102:9625-30 (2005) (incorporated by reference herein in its entirety and specifically for its description of RNA analysis in copy number determinations) and Carter et al., Nature Genetics 38:1043-48 (2006) (incorporated by reference herein in its entirety and specifically for its description of RNA analysis in copy number determinations). When analyzing mRNA (or DNA derived therefrom) to determine the relative copy number of a chromosomal region, in certain embodiments, the copy numbers (i.e., expression levels) of a plurality of transcripts, corresponding to a plurality of loci within the region are typically measured. In various embodiments, the number of different transcripts assessed for a particular region is up to about 3, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 or more. In certain embodiments, the copy number(s) (i.e., expression level(s)) of one or more control transcripts corresponding to genes whose expression level(s) is/are expected to be unaltered in oral SCC can be measured. For example, transcripts from one or more gene(s) in control chromosomal regions can be measured, e.g., in various embodiments, transcripts from up to about 3, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 or more genes in a control chromosomal region.
If the results indicate no gain of chromosomal regions 3q, 8q, and 20 and no loss of chromosomal region 8p, this finding indicates that the oral SCC is of a subtype that is unlikely to metastasize. In some embodiments, no gain of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and no loss of chromosomal region 8pter-p23.1 indicate an oral SCC that is unlikely to metastasize. The subject having this oral SCC can be treated for the oral SCC without removing the cervical lymph nodes.
If the results indicate a gain at one or more (e.g., two or three) of chromosomal regions 3q, 8q, and 20 and/or a loss of chromosomal region 8p, this finding indicates that the oral SCC is of a subtype that has a substantial likelihood of metastasizing. In some embodiments, a gain of one or more (e.g., two or three) of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and/or a loss of chromosomal region 8pter-p23.1 indicate an oral SCC for which there is a substantial likelihood that it has metastasized or that it will metastasize. In such a subject, the treatment for oral SCC can include removing the cervical lymph nodes.
Furthermore, the studies described herein show that the assessment of the fraction of the genome involved in DNA copy number gains (FGG) and the fraction that has any copy number alteration (FGA) are also strongly associated with risk of metastasis. Thus, 3q8pq20 status, FGG and/or FGA are all genomic biomarkers that can be useful in discriminating a subtype of oral SCC with a substantially high risk of metastasis from a subtype of oral SCC with a sufficiently low risk of metastasis to inform a significant aspect of clinical treatment (namely, the decision to remove cervical lymph nodes. Accordingly, in certain embodiments, the invention provides methods of determining the presence of oral squamous cell carcinoma that is substantially likely to metastasize, versus that which is unlikely to metastasize in an oral sample from a subject based on determining fraction of genome gained and/or the fraction of genome altered.
In particular embodiments, to measure the amount of the genome altered, each chromosomal region queried (e.g, each probe, such as a clone that is employed to probe a region) is assigned a genomic distance equal to the sum of one half the distance between its center and that of the neighboring chromosomal regions queried (e.g., neighboring clones). The genomic distances of clones that are gained or lost are summed and the resulting value represents the fraction of the genome altered (FGA). To calculate only the fraction of the genome gained or lost, only the genomic distances of clones that are gained or lost, respectively are considered. RNA expression levels can provide an indirect measure of the fraction of genome altered or gained or lost. See, e.g., Carter et al., Nature Genetics 38:1043-48 (2006) (incorporated by reference herein in its entirety and specifically for its description of RNA analysis in copy number determinations).
In various embodiments, a fraction of genome gained (FGG) below a threshold value of about 0.080, for example, below a threshold value of about 0.080, about 0.075, about 0.070, about 0.065, about 0.060, about 0.055, about 0.050, about 0.045, about 0.040, about 0.035, about 0.030, or about 0.025, indicates an oral SCC that is unlikely to metastasize, whereas an FGG above the threshold indicates an oral SCC having a substantial likelihood of metastasizing. In various embodiments, the threshold is about 0.065. In various embodiments, a fraction of genome altered (FGA) below a threshold of about 0.115, for example, below a threshold of about 0.110, about 0.105, about 0.100, about 0.095, about 0.090, about 0.085, about 0.080, about 0.075, about 0.070, about 0.065, about 0.060, about 0.055, about 0.050, about 0.045, about 0.040, about 0.035, about 0.030, or about 0.025, indicates an oral SCC that is unlikely to metastasize, whereas an FGA above the threshold indicates an oral SCC having a substantial likelihood of metastasizing. In various embodiments, the threshold is about 0.095. Additionally, the FGG or FGA threshold values can fall within any range bounded by any of the above-listed values for each (i.e., FGG or FGA) that are set forth above. By applying a lower threshold value, metastatic cases are more likely to be identified; however, this may lead to neck dissections on many patients who don't need it. Applying a higher threshold value spares patients unneeded neck surgery; however, patients with metastasis may not receive surgery (e.g., neck dissection to remove one or move cervical lymph nodes) and thus have a bad outcome. The applied threshold value depends on the judgment of a trained clinician, e.g., based on balancing the values of the various outcomes.
The invention further provides methods of assessing the risk, that if an oral epithelial dysplasia progresses, the oral epithelial dysplasia will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis. These methods entail determining relative DNA copy numbers in a biological sample from a subject for the same chromosomal regions used for subtyping oral SCCs, namely 3q, 8p, 8q, and 20. A finding of no gain of chromosomal regions 3q, 8q, and 20, and no loss of chromosomal region 8p is indicative of oral epithelial dysplasia that, if it progresses, is unlikely to progress to metastatic oral squamous cell carcinoma. A finding of one or more of these copy number alterations, i.e., a gain of one or more (e.g., two or three) of chromosomal regions 3q, 8q, and 20, and/or a loss of chromosomal region 8p is indicative of oral epithelial dysplasia that, if it progresses, will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis. The considerations for making these determinations (probes, methods, use of controls, etc.) are the same those described above for subtyping oral SCC, and specific aspects of such determinations are further described in the following sections.
If dysplasia of the 3q8pq20-positive subtype (i.e., gains of regions on chromosome 3q and/or 8q, and/or loss of a region of 8p, and/or gain of chromosome 20) progresses to cancer, it is likely to do so by the acquisition of further copy number alterations. Thus, one can monitor such dysplasias for progression using one or more probes to detect copy number alterations at chromosomal locations other than 3q, 8p, 8q and 20 that are frequently altered in oral SCC (see, e.g., Table 6). On the other hand, one would not expect non-3q8pq20 lesions to progress by the acquisition of further copy number alterations, so that evaluating these lesions for acquisition of copy number alterations would be unlikely to detect progression.
With respect to use of the non-3q8pq20 subtype for identifying patients at low risk of metastasis and use of the 3q8pq20-positive subtype for identifying patients having a substantially higher risk of metastasis, the 3q8pq20 biomarker can also be used together with current clinical assessments, e.g., tumor size, tumor thickness, tumor staging, to assist clinicians in providing a diagnosis and treatment regimen (e.g., whether to proceed with surgical treatment of the neck, i.e. neck dissection).
In embodiments where the DNA in the biological sample has a gain or loss of one or more (e.g., two, three or four) of said chromosomal regions (3q, 8p, 8q, and 20), i.e., is positive for the 3q8pq20 subtype, indicating an oral epithelial dysplasia that, if it progresses, will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis, the methods can further comprise determining the dysplasia grade, and/or the presence of erythroplakia (a.k.a., erythroleukoplakia or leukoplakia). This can be done using any method known in the art, including without limitation visual inspection, palpation, and microscopic analysis. On visual examination, leukoplakia may vary from a barely evident, vague whiteness on a base of uninflamed, normal-appearing tissue to a definitive white, thickened, leathery, fissured, verrucous (wartlike) lesion. On palpation, some lesions may be soft, smooth, or finely granular. Other lesions may be roughened, nodular, or indurated. Malignant transformation to squamous cell carcinoma is seen in more than 15% of cases.
Histologic changes range from hyperkeratosis, dysplasia, and carcinoma in situ to invasive squamous cell carcinoma. The term “dysplasia” indicates abnormal epithelium and disordered growth, whereas the term “atypia” refers to abnormal nuclear features. Increasing degrees of dysplasia are designated as mild, moderate, and severe and are subjectively determined microscopically. Specific microscopic characteristics of dysplasia include (1) dropshaped epithelial ridges, (2) basal cell crowding, (3) irregular stratification, (4) increased and abnormal mitotic figures, (5) premature keratinization, (6) nuclear pleomorphism and hyperchromatism, and (7) an increased nuclear-cytoplasmic ratio.
It is generally accepted that the more severe the epithelial changes, the more likely a lesion is to evolve into cancer. When the entire thickness of epithelium is involved with these changes in a so-called top-to-bottom pattern, the term carcinoma in situ may be used. Designation of “carcinoma in situ” may also be used when cellular atypia is particularly severe, even though the changes may not be evident from basement membrane to surface. Carcinoma in situ is not regarded as a reversible lesion, although it may take many years for invasion to occur. A majority of squamous cell carcinomas of the upper aerodigestive tract, including the oral cavity, are preceded by epithelial dysplasia. Conceptually, invasive carcinoma begins when a microfocus of epithelial cell invades the lamina propria 1 to 2 mm beyond the basal lamina. At this early stage, the risk of regional metastasis is low. Further information on grading oral epithelial dysplasia can be found, e.g., in Regezi, et al., Oral Pathology: Clinical Pathologic Correlations, 5th edition (Oct. 2, 2007), Saunders.
Current management of dysplasia is based on the grade of dysplasia. Although there are a number of dysplasia grading systems that have been described, the most commonly used system is as follows. Mild dysplasias have architectural changes confined to the basal third of the full thickness of epithelium. Moderate dysplasias are up to two-thirds the full thickness of epithelium. Severe dysplasias are greater than two thirds of the full thickness, but without invasion through the basement membrane. Consideration is then given to the degree of cellular atypia. These features include increased nuclear cytoplasmic ratios, increased or abnormal mitoses, or pleomorphism of nuclei. Currently, the grading of dysplasia is used to predict risk. As many as 36% of severe dysplasias become invasive cancer (Silverman S, Jr., Gorsky M, Lozada F. Oral leukoplakia and malignant transformation. A follow-up study of 257 patients. Cancer. 1984 Feb. 1; 53(3):563-8; Schepman K P, van der Meij E H, Smeele L E, van der Waal I. Malignant transformation of oral leukoplakia: a follow-up study of a hospital-based population of 166 patients with oral leukoplakia from The Netherlands. Oral Oncol. 1998 July; 34(4):270-5; Lee J J, Hong W K, Hittelman W N, Mao L, Lotan R, Shin D M, et al. Predicting cancer development in oral leukoplakia: ten years of translational research. Clin Cancer Res. 2000 May; 6(5):1702-10). Cancer can also derive from hyperplasia or mild dysplasia, however. One group found that patients with mild dysplasia had the same transformation rates as those with severe dysplasia (Holmstrup P, Vedtofte P, Reibel J, Stoltze K. Long-term treatment outcome of oral premalignant lesions. Oral Oncol. 2006 May; 42(5):461-74).
In the context of the present invention, the assessment of the stage or monitoring of progression of an oral epithelial dysplasia positive for the 3q8pq20 subtype is helpful in assessing the need for, and timing of, aggressive interventions, such as excision of the dysplasia because, if such a dysplasia progresses, it will progress to oral squamous cell carcinoma having a substantial likelihood of metastasis. Any of the methods described herein or known in the art for assessing oral epithelial dysplasia can be carried out at the time of initial detection and at one or more time points thereafter separated by periods of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 months, or 1, 2, 3, 4, or 5 or more years, or any time period falling within a range bounded by any of the periods listed above.
Furthermore, in embodiments where the DNA in the biological sample has a gain or loss of one or more (e.g., two, three or four) of said chromosomal regions (3q, 8p, 8q, and 20), i.e., is positive for the 3q8pq20 subtype, indicating the oral dysplasia is likely to progress to metastatic oral squamous cell carcinoma, the methods described herein may further comprise more aggressively treating the oral dysplasia, e.g., including excising the dysplasia (e.g., by using a scalpel or laser excision) and chemoprevention.
The most common method for managing biopsy proven dysplasia of the oral cavity is local surgical excision. Excision of a dysplastic lesion provides a valuable histologic diagnosis. As mentioned above, 5% of idiopathic leukoplakias already have invasive cancer at the initial biopsy (Silverman S, Jr., Gorsky M, Lozada F. Oral leukoplakia and malignant transformation. A follow-up study of 257 patients. Cancer. 1984 Feb. 1; 53(3):563-8). In addition, incisional biopsies are subject to sampling error, and dysplasia or carcinoma can be easily missed. Some studies have reported that over 10% of lesions diagnosed by incisional biopsy as dysplasia demonstrated invasive carcinoma after excision (Chiesa F, Tradati N, Sala L, Costa L, Podrecca S, Boracchi P, et al. Follow-up of oral leukoplakia after carbon dioxide laser surgery. Arch Otolaryngol Head Neck Surg. 1990 February; 116(2):177-80; Thomson P J, Wylie J. Interventional laser surgery: an effective surgical and diagnostic tool in oral precancer management. Int J Oral Maxillofac Surg. 2002 April; 31(2):145-53). Evaluation of relative DNA copy number alterations at chromosomal regions (3q, 8p, 8q, and 20) to determine the 3q8pq20 subtype provides additional information to guide treatment and allow the provider and patient to make an informed decision regarding excision of a dysplastic lesion. A dysplasia that carries a higher risk of transforming into a metastatic oral cancer based on the method would have a stronger indication for surgical excision.
The finding that copy number alterations at 3q, 8p, 8q, and 20 or oral SCC indicate likelihood of metastasis can also be exploited to identify oral SCC that has already metastasized by analyzing a lymph node sample for relative copy number alterations at these loci. In particular, a gain of one or more (e.g., two or three) of chromosomal regions 3q, 8q, and 20, and/or a loss of chromosomal region 8p is indicative of metastatic oral squamous cell carcinoma. In some embodiments, a gain of one or more (e.g., two or three) of chromosomal regions 3q24-qter, 8q12-q24.2, and 20pter-qter and/or a loss of chromosomal region 8pter-p23.1 is indicative of metastatic oral squamous cell carcinoma. In certain embodiments, one or more additional genetic alterations can be determined, such as fraction of genome gained (FGG), fraction of genome altered (FGA), methylation status, TP53 mutation(s), and the presence of relative copy number alterations at one or more loci other than 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter.
Since the fraction of genome gained (FGG) and/or the fraction of genome altered (FGA) are also indicators of the likelihood of oral SCC metastasis, either or both of these parameters can be determined in a lymph node sample to identify the presence of metastatic oral squamous cell carcinoma in a lymph node sample. The considerations for determining 3q8pq20 status (probes, methods, use of controls, etc.) are the same as those described above for subtyping oral SCC, and specific aspects of such determinations are further described in the following sections). For this embodiment, an FGG and/or FGA value that is greater than zero (0) is an indication of cancer in the lymph node.
Terms used in the claims and specification are defined as set forth below unless otherwise specified.
The term “oral SCC” refers to a malignant neoplasm of oral tissue, such as, e.g., the tongue, gingiva, floor of mouth, retromolar trigone, buccal mucosa, and lip.
The terms “tumor” or “cancer” in an animal refer to the presence of cells possessing characteristics such as atypical growth or morphology, including uncontrolled proliferation, immortality, metastatic potential, rapid growth and proliferation rate, and certain characteristic morphological features. Often, cancer cells will be in the form of a tumor, but such cells may exist alone within an animal. The term tumor includes both benign and malignant neoplasms. The term “neoplastic” refers to both benign and malignant atypical growth.
The term “oral sample” is intended to mean a sample obtained from the oral cavity or surrounding tissue of a subject suspected of having, or having, oral SCC and/or dysplasia.
The terms “nucleic acid” or “polynucleotide,” as used herein, refer to a deoxyribonucleotide or ribonucleotide in either single- or double-stranded form. The term encompasses nucleic acids, i.e., oligonucleotides, containing known analogues of natural nucleotides which have similar or improved binding properties, for the purposes desired. The term also encompasses nucleic-acid-like structures with synthetic backbones. DNA backbone analogues provided by the invention include phosphodiester, phosphorothioate, phosphorodithioate, methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate, 3′-thioacetal, methylene(methylimino), 3′-N-carbamate, morpholino carbamate, and peptide nucleic acids (PNAs); see Oligonucleotides and Analogues, a Practical Approach, edited by F. Eckstein, IRL Press at Oxford University Press (1991); Antisense Strategies, Annals of the New York Academy of Sciences, Volume 600, Eds. Baserga and Denhardt (NYAS 1992); Milligan (1993) J. Med. Chem. 36:1923-1937; Antisense Research and Applications (1993, CRC Press). PNAs contain non-ionic backbones, such as N-(2-aminoethyl) glycine units. Phosphorothioate linkages are described in WO 97/03211; WO 96/39154; Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197. Other synthetic backbones encompassed by the term include methyl-phosphonate linkages or alternating methylphosphonate and phosphodiester linkages (Strauss-Soukup (1997) Biochemistry 36: 8692-8698), and benzylphosphonate linkages (Samstag (1996) Antisense Nucleic Acid Drug Dev 6: 153-156).
The term “relative copy number” is used herein to refer to the nucleic acid copy number for a chromosomal region, relative to the copy number for another chromosomal region. In some cases, either one or both of the copy numbers may represent the average, median, mode etc. of one or more regions up to and including the whole genome. Relative copy number can be determined in any of a number of ways familiar to those of skill in the art. For example relative copy numbers can be determined by comparing a measured copy number value for a target chromosomal region to one or more measured copy number values for one or more other regions of the genome (e.g., one or more selected control regions) and/or to a copy number value for the rest of the genome, such as average, median, or other representative copy number characteristic of the genome as a whole.
The terms “copy number difference” and “altered copy number” refer to a difference in a copy number value for a chromosome region, e.g., a difference between a copy number value for a particular chromosomal region and a copy number value that is representative of the rest of the genome. In some cases either one or both of the copy numbers may represent the average, median, mode etc. of one or more regions up to and including the whole genome.
The terms “making a copy number determination” and “querying the copy number” refer to measuring any indication of nucleic acid copy number and do not require determining absolute copy number for any chromosomal region.
The term “substantial likelihood of metastasis” refers to the probability that an oral squamous cell carcinoma (SCC) has metastasized or will metastasize. In the context of the present invention, an oral SCC having no copy number alterations at any of the following chromosomal regions: 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter is an oral SCC subtype at low risk for metastasis. As used herein, the term “substantial likelihood of metastasis” refers to a risk of metastasis, which is associated with an oral SCC that is not of this low-risk subtype.
The terms “hybridizing specifically to,” “specific hybridization,” and “selectively hybridize to,” as used herein, refer to the binding, duplexing, or hybridizing of a nucleic acid molecule preferentially to a particular nucleotide sequence under stringent conditions. The term “stringent conditions” refers to conditions under which a probe will hybridize preferentially to its target sequence, and to a lesser extent to, or not at all to, other sequences. A “stringent hybridization” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization (e.g., as in array, Southern or Northern hybridization, or FISH) are sequence-dependent, and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in, e.g., Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I, Ch. 2, “Overview of principles of hybridization and the strategy of nucleic acid probe assays,” Elsevier, N.Y. (“Tijssen”). Generally, highly stringent hybridization and wash conditions for filter hybridizations are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH, whereas for FISH the appropriate temperature difference may be 20 to 25° C. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the Tm for a particular probe. Dependency of hybridization stringency on buffer composition, temperature and probe length are well known to those of skill in the art (see, e.g., Sambrook and Russell (2001) Molecular Cloning: A Laboratory Manual (3rd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, and detailed discussion, below).
A “probe” is a nucleic acid capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, generally through complementary base pairing, usually through hydrogen bond formation, thus forming a duplex structure. The probe can be labeled with a detectable label to permit facile detection of the probe, particularly once the probe has hybridized to its complementary target. Alternatively, however, the probe may be unlabeled, but may be detectable by specific binding with a ligand that is labeled, either directly or indirectly.
The term “primer” refers to an oligonucleotide that is capable of hybridizing (also termed “annealing”) with a nucleic acid and serving as an initiation site for nucleotide (RNA or DNA) polymerization under appropriate conditions (i.e., in the presence of four different nucleoside triphosphates and an agent for polymerization, such as DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer, but primers are typically at least 7 nucleotides long and, more typically range from 10 to 30 nucleotides, or even more typically from 15 to 30 nucleotides, in length. Other primers can be somewhat longer, e.g., 30 to 50 nucleotides long. In this context, “primer length” refers to the portion of an oligonucleotide or nucleic acid that hybridizes to a complementary “target” sequence and primes nucleotide synthesis. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the target. A primer need not reflect the exact sequence of the target but must be sufficiently complementary to hybridize with a target. A primer is said to anneal to another nucleic acid if the primer, or a portion thereof, hybridizes to a nucleotide sequence within the nucleic acid.
As used herein, with reference to a method performed by an individual, the term “amplification,” encompasses any means by which at least a part of at least one target nucleic acid is reproduced, typically in a template-dependent manner, including without limitation, a broad range of techniques for amplifying nucleic acid sequences, either linearly or exponentially. Exemplary means for performing an amplifying step include polymerase chain reaction (PCR), ligase chain reaction (LCR), ligase detection reaction (LDR), multiplex ligation-dependent probe amplification (MLPA), ligation followed by Q-replicase amplification, primer extension, strand displacement amplification (SDA), hyperbranched strand displacement amplification, multiple displacement amplification (MDA), nucleic acid strand-based amplification (NASBA), two-step multiplexed amplifications, rolling circle amplification (RCA), and the like, including multiplex versions and combinations thereof, for example but not limited to, OLA/PCR, PCR/OLA, LDR/PCR, PCR/PCR/LDR, PCR/LDR, LCR/PCR, PCR/LCR (also known as combined chain reaction—CCR), digital amplification, and the like. Descriptions of such techniques can be found in, among other sources, Ausbel et al.; PCR Primer: A Laboratory Manual, Diffenbach, Ed., Cold Spring Harbor Press (1995); The Electronic Protocol Book, Chang Bioscience (2002); Msuih et al., J. Clin. Micro. 34:501-07 (1996); The Nucleic Acid Protocols Handbook, R. Rapley, ed., Humana Press, Totowa, N.J. (2002); Abramson et al., Curr Opin Biotechnol. 1993 February; 4(1):41-7, U.S. Pat. No. 6,027,998; U.S. Pat. No. 6,605,451, Barany et al., PCT Publication No. WO 97/31256; Wenz et al., PCT Publication No. WO 01/92579; Day et al., Genomics, 29(1): 152-162 (1995), Ehrlich et al., Science 252:1643-50 (1991); Innis et al., PCR Protocols: A Guide to Methods and Applications, Academic Press (1990); Favis et al., Nature Biotechnology 18:561-64 (2000); and Rabenau et al., Infection 28:97-102 (2000); Belgrader, Barany, and Lubin, Development of a Multiplex Ligation Detection Reaction DNA Typing Assay, Sixth International Symposium on Human Identification, 1995 (available on the world wide web at: promega.com/geneticidproc/ussymp6proc/blegrad.html); LCR Kit Instruction Manual, Cat. #200520, Rev. #050002, Stratagene, 2002; Barany, Proc. Natl. Acad. Sci. USA 88:188-93 (1991); Bi and Sambrook, Nucl. Acids Res. 25:2924-2951 (1997); Zirvi et al., Nucl. Acid Res. 27:e40i-viii (1999); Dean et al., Proc Natl Acad Sci USA 99:5261-66 (2002); Barany and Gelfand, Gene 109:1-11 (1991); Walker et al., Nucl. Acid Res. 20:1691-96 (1992); Polstra et al., BMC Inf. Dis. 2:18- (2002); Lage et al., Genome Res. 2003 February; 13(2):294-307, and Landegren et al., Science 241:1077-80 (1988), Demidov, V., Expert Rev Mol Diagn. 2002 November; 2(6):542-8., Cook et al., J Microbiol Methods. 2003 May; 53(2):165-74, Schweitzer et al., Curr Opin Biotechnol. 2001 February; 12(1):21-7, U.S. Pat. No. 5,830,711, U.S. Pat. No. 6,027,889, U.S. Pat. No. 5,686,243, PCT Publication No. WO0056927A3, and PCT Publication No. WO9803673A1.
In some embodiments, amplification comprises at least one cycle of the sequential procedures of: annealing at least one primer with complementary or substantially complementary sequences in at least one target nucleic acid; synthesizing at least one strand of nucleotides in a template-dependent manner using a polymerase; and denaturing the newly-formed nucleic acid duplex to separate the strands. The cycle may or may not be repeated. Amplification can comprise thermocycling or can be performed isothermally.
As those of skill in the art readily appreciate, the term “amplification” also refers to a chromosomal abnormality characterized by the gain of nucleic acid(s), and it will be clear to those of skill, from the context, whether this meaning is intended.
The term “label,” as used herein, refers to any atom or molecule that can be used to provide a detectable and/or quantifiable signal. In particular, the label can be attached, directly or indirectly, to a nucleic acid or protein. Suitable labels that can be attached to probes include, but are not limited to, radioisotopes, fluorophores, chromophores, mass labels, electron dense particles, magnetic particles, spin labels, molecules that emit chemiluminescence, electrochemically active molecules, enzymes, cofactors, and enzyme substrates.
The term “label containing moiety” or “detection moiety” generally refers to a molecular group or groups associated with a probe, either directly or indirectly, that allows for detection of that probe upon hybridization to its target.
The term “target region” or “nucleic acid target” refers to a nucleotide sequence that resides at a specific chromosomal locus.
The term “control chromosomal region” refers to a chromosomal region that is not likely to have an altered copy number in oral SCC.
Many types of oral samples from a patient having, or suspected of having, oral SCC can be employed in the methods described herein. Illustrative samples include saliva, an oral washing sample, an oral swab or brush sample, or an oral tissue sample, e.g., an incisional biopsy of the tumor from a site selected from the group consisting of: tongue, gingiva, floor of mouth, retromolar trigone, buccal mucosa, lip, or other oral site. In some embodiments, the sample is an incisional biopsy sample. The sample may be from the primary tumor, completely within a tumor or lesion (e.g., pre-cancerous or cancerous), or from the margin of a tumor or lesion. In various embodiments, a lymph node sample, e.g., a cervical lymph node sample, may be evaluated.
Prior to detection, samples may be optionally pre-selected based on morphological characteristics, specific staining and the like. Pre-selection identifies suspicious cells, thereby allowing the relative copy number determination to be focused on those cells. Pre-selection increases the likelihood that the result will be correct. Pre-selection of a suspicious region on a tissue section may be performed on a serial section stained by conventional means, such as H&E or PAP staining, and the suspect region marked by a pathologist or otherwise trained technician. The same region can be located on the serial section stained by in situ hybridization and nuclei analyzed within that region, e.g, by in situ hybridization. Within the marked region, analysis may be limited to nuclei exhibiting abnormal characteristics as described above. Alternatively, the suspect region can be dissected from the tissue and analyzed by any applicable method including array-based hybridization assays, amplification-based assays, and high-throughput DNA sequencing. Single-cell analysis can be carried out, for example, using amplification-based assays
Similarly, in samples with dispersed cells such as saliva or brushings, cells with apparent cytologic abnormalities may be selected for analysis. During pre-selection involving dispersed cells, the cells can be placed on a microscope slide and visually scanned for cytologic abnormalities commonly associated with dysplastic and neoplastic cells. Such abnormalities include abnormalities in nuclear size, nuclear shape, and nuclear staining, as assessed by counterstaining nuclei with nucleic acid stains or dyes such as propidium iodide or 4,6-diamidino-2-phenylindole dihydrochloride (DAPI). Typically, neoplastic cells harbor nuclei that are enlarged, irregular in shape, and/or show a mottled staining pattern. Propidium iodide, typically used at a concentration of about 0.4 μg/ml to about 5 μg/ml, is a red-fluorescing DNA-specific dye that can be observed at an emission peak wavelength of 614 nm. DAPI, typically used at a concentration of about 125 ng/ml to about 1000 ng/ml, is a blue fluorescing DNA-specific stain that can be observed at an emission peak wavelength of 452 nm.
In certain embodiments, only those cells pre-selected for detection are subjected to analysis for chromosomal losses and/or gains. In some embodiments, pre-selected cells on the order of at least 20, at least 30, at least 40, at least 50, or at least 100, in number, are chosen for assessing chromosomal losses and/or gains. In other embodiments, cells to be analyzed may be chosen independent of cytologic or histologic features. For example, in in situ hybridization, all non-overlapping cells in a given area or areas on a microscope slide may be assessed for chromosomal losses and/or gains.
The sample can be processed or treated in any manner suitable for the analytical method to be employed. For example, samples to be analyzed by in situ hybridization can be treated with a fixative, such as formaldehyde, embedded in paraffin, and sectioned for use in the methods of the invention. Alternatively, fresh or frozen tissue can be pressed against glass slides to form monolayers of cells known as touch preparations, which contain intact nuclei and do not suffer from the truncation artifact of sectioning. These cells may be fixed, e.g., in alcoholic solutions such as 100% ethanol or 3:1 methanol:acetic acid. Nuclei can also be extracted from thick sections of paraffin-embedded specimens to reduce truncation artifacts and eliminate extraneous embedded material. Samples can also consist of cells obtained from saliva or brushings of oral lesions, which are then deposited on slides by well-known methods such as dropping, centrifugation or smearing. Typically, samples, once obtained, are harvested and processed prior to hybridization using standard methods known in the art. For in situ hybridization, such processing may include protease treatment and additional fixation in an aldehyde solution such as formaldehyde.
Sample nucleic acids can be extracted, using established methods, to the extent necessary to facilitate the analysis, e.g. high-throughput DNA sequencing. In some cases, the nucleic acid may be amplified prior to analysis. Sample nucleic acids are, in some embodiments, such as array CGH, labeled using any suitable labeling method. In some embodiments, genomic DNA is analyzed to determine relative copy number. In other embodiments, RNA, e.g, mRNA levels can be analyzed to determine relative copy number (i.e., expression analysis). In certain embodiments, RNA, or more specifically, mRNA, is converted to DNA, for example, by the use of reverse transcriptase to produce DNA or by amplification. If RNA is converted to DNA prior to the analysis, the method employed is preferably one that maintains the relative copy numbers of the transcripts. Such techniques are well known and suitable methods for particular applications can be selected by those of skill in the art.
Some embodiments rely on the use of probes to detect relative copy number at particular loci.
In situ hybridization typically employs probes that can query the target chromosomal region of interest, i.e., can selectively bind to that region and provide a detectable signal. A probe to a particular chromosomal region can include multiple polynucleotide fragments, e.g., ranging in size from about 50 to about 1,000 nucleotides in length.
In situ hybridization probes that can be used in the method described herein include probes that selectively hybridize to chromosomal regions (e.g., 3q, 8p, 8q, and 20) or subregions of these chromosomal regions, i.e., 3q24-qter, 8pter-p23.1, 8q12-q24.2, and 20pter-qter (i.e., the entire chromosome 20). (The subregion designations as used herein include the designated band and typically about 10 megabases of genomic sequence to either side.) Probes useful in the in situ hybridization methods described herein include locus-specific probes and centromeric probes. A locus-specific probe selectively binds to a specific locus at a chromosomal region, e.g., 3q24-qter, 8pter-p23.1, 8q12-q24.2. A centromeric probe typically binds to repetitive sequences located at the centromere. Centromeric probes have been identified that selectively bind to the centromeric region of a particular chromosome and thus can be used to identify the presence of that region in a sample.
In situ hybridization probes that target a chromosomal region or subregion can readily be prepared by those of skill in the art or can be obtained commercially, e.g., from Abbott Molecular, Molecular Probes (Invitrogen, Life Technologies), or Cytocell (Oxfordshire, UK). Such probes are prepared using standard techniques, for example, from peptide nucleic acids, cloned human DNA such as plasmids, bacterial artificial chromosomes (BACs) (available from BACPAC, Oakland Calif.), and P1 artificial chromosomes (PACs) that contain inserts of human DNA sequences. Suitable probes may also be prepared, e.g., via amplification or synthetically.
Probes for assays other than in situ hybridization, for example quantitative PCR, are designed and employed to selectively hybridize to the target nucleic acids of interest. Probes can be perfectly complementary to the target nucleic acid sequence or can be less than perfectly complementary. In certain embodiments, probes anneal to the target sequence under stringent hybridization conditions.
Probes may also be employed as isolated nucleic acids immobilized on a solid surface (e.g., nitrocellulose, glass, silicon, beads), as in array Comparative Genomic Hybridization (aCGH). In some embodiments, the probes may be members of an array of nucleic acids as described, for instance, in WO 96/17958, which is hereby incorporated by reference in its entirety and specifically for its description of array CGH. Techniques capable of producing high density arrays are well-known (see, e.g., Fodor et al. Science 767-773 (1991) and U.S. Pat. No. 5,143,854), both of which are hereby incorporated by reference for this description. Customized arrays containing particular sequences are commercially available from such companies as Agilent, Nimblegen etc.
Some embodiments employ primers to detect relative copy number at particular loci, e.g., amplification-based assays and high-throughput DNA sequencing. Primers suitable for nucleic acid amplification are sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization. The primers should be sufficiently complementary and sufficiently long to selectively anneal to their respective target sites and form stable duplexes. It will be understood that certain bases (e.g., the 3′ base of a primer) are generally desirably perfectly complementary to corresponding bases of the target nucleic acid sequence. In certain embodiments, primers anneal to the target sequence under stringent hybridization conditions.
One skilled in the art knows how to select appropriate primer pairs to amplify the target nucleic acid of interest. For example, PCR primers can be designed by using any commercially available software or open source software, such as Primer3 (see, e.g., Rozen and Skaletsky (2000) Meth. Mol. Biol., 132: 365-386; on the interne at broad.mit.edu/node/1060, and the like) or by accessing the Roche UPL website.
Primers may be prepared by any suitable method, including, for example, cloning and restriction of appropriate sequences or direct chemical synthesis or can be obtained from a commercial source.
Conditions for specifically hybridizing the probes and/or primers to their nucleic acid targets generally include the combinations of conditions that are employable in a given hybridization procedure to produce specific hybrids, which may easily be determined by one of skill in the art. Such conditions typically involve controlled temperature, liquid phase, and contact between a probe and a target. Hybridization conditions vary depending upon many factors including probe/primer concentration, target length, target and probe/primer G-C content, solvent composition, temperature, and duration of incubation. At least one denaturation step may precede contact of the probes/primers with the targets. Alternatively, both the probe/primer and nucleic acid target may be subjected to denaturing conditions together while in contact with one another, or with subsequent contact of the probe/primer with the biological sample. Hybridization may be achieved with subsequent incubation of the probe/primer/sample in, for example, a liquid phase that is compatible with subsequent steps of the assay. For example if no subsequent enzymatic amplification is required the liquid phase may comprise about a 50:50 volume ratio mixture of 2-4×SSC and formamide, at a temperature in the range of about 25 to about 55° C. Higher hybridization temperatures are typically employed if formamide is not included in the liquid. Temperatures are also adjusted based on the length of the complementary sequences that are participating in the hybridization. Hybridization times range from about several seconds for PCR primers to about 96 hours. In order to increase specificity, use of a blocking agent such as unlabeled blocking nucleic acid as described in U.S. Pat. No. 5,756,696 (the contents of which are herein incorporated by reference in their entirety, and specifically for the description of the use of blocking nucleic acid), may be employed in conjunction with the methods of the present invention. Other conditions may be readily employed for specifically hybridizing the probes/primers to their nucleic acid targets present in the sample, as would be readily apparent to one of skill in the art.
Upon completion of a suitable incubation period, non-specific binding of probes to sample DNA may be removed by one or a series of washes. Temperature, salt, and formamide etc. concentrations are suitably chosen for a desired stringency. The level of stringency required depends on the complexity of a specific probe sequence in relation to the genomic sequence, and may be determined by systematically hybridizing probes to samples of known genetic composition. In general, high stringency washes without formamide may be carried out for conventional nucleic acids at a temperature in the range of about 65 to about 80° C. with about 0.2× to about 4×SSC and about 0.1% to about 1% of a non-ionic detergent such as Nonidet P-40 (NP40). If lower stringency washes are required, the washes may be carried out at a lower temperature with an increased concentration of salt.
Hybridization
The hybridization of probes can be detected using any means known in the art. Label-containing moieties can be associated directly or indirectly with probes. Different label-containing moieties can be selected for each individual probe within a particular combination so that each hybridized probe is visually distinct from the others upon detection. Where FISH or NanoString® methodologies are employed, the probes can be conveniently labeled with distinct fluorescent label-containing moieties. In such embodiments, fluorophores, organic molecules that fluoresce upon irradiation at a particular wavelength, are typically directly attached to the probes. A large number of fluorophores are commercially available in reactive forms suitable for DNA labeling.
Attachment of fluorophores to nucleic acid probes is well known in the art and may be accomplished by any available means. Fluorophores can be covalently attached to a particular nucleotide, for example, and the labeled nucleotide incorporated into the probe using standard techniques such as nick translation, random priming, PCR labeling, and the like. Alternatively, the fluorophore can be covalently attached via a linker to the deoxycytidine nucleotides of the probe that have been transaminated. Methods for labeling probes are described in U.S. Pat. No. 5,491,224 and Molecular Cytogenetics: Protocols and Applications (2002), Y.-S. Fan, Ed., Chapter 2, “Labeling Fluorescence In situ Hybridization Probes for Genomic Targets,” L. Morrison et al., p. 21-40, Humana Press, both of which are herein incorporated by reference for their descriptions of labeling probes.
Exemplary fluorophores that can be used for labeling probes include TEXAS RED (Molecular Probes, Inc., Eugene, Oreg.), CASCADE blue aectylazide (Molecular Probes, Inc., Eugene, Oreg.), SPECTRUMORANGE™ (Abbott Molecular, Des Plaines, Ill.) and SPECTRUMGOLD™ (Abbott Molecular).
One of skill in the art will recognize that other agents or dyes can be used in lieu of fluorophores as label-containing moieties. Luminescent agents include, for example, radioluminescent, chemiluminescent, bioluminescent, and phosphorescent label-containing moieties. Silver or gold, as well as isotopic mass tags, can also be employed as labeling agents. Detection moieties that are visualized by indirect means can be used. For example, probes can be labeled with biotin or digoxygenin using routine methods known in the art, and then further processed for detection. Visualization of a biotin-containing probe can be achieved via subsequent binding of avidin conjugated to a detectable marker. The detectable marker may be a fluorophore, in which case visualization and discrimination of probes may be achieved as described above for FISH.
Probes hybridized to target regions may alternatively be visualized by enzymatic reactions of label moieties with suitable substrates for the production of insoluble color products. Each probe may be discriminated from other probes within the set by choice of a distinct label moiety. A biotin-containing probe within a set may be detected via subsequent incubation with avidin conjugated to alkaline phosphatase (AP) or horseradish peroxidase (HRP) and a suitable substrate. 5-bromo-4-chloro-3-indolylphosphate and nitro blue tetrazolium (NBT) serve as substrates for alkaline phosphatase, while diaminobenzidine serves as a substrate for HRP.
In embodiments where fluorophore-labeled probes or probe compositions are used, the detection method can involve fluorescence microscopy, flow cytometry, or other means for determining probe hybridization. Any suitable microscopic imaging method may be used in conjunction with the methods of the present invention for observing multiple fluorophores. In the case where fluorescence microscopy is employed, hybridized samples may be viewed under light suitable for excitation of each fluorophore and with the use of an appropriate filter or filters. Automated digital imaging systems such as the MetaSystems, BioView or Applied Imaging systems may alternatively be used. Alternatively, the assay format may employ the methodologies described in Direct Multiplexed Measurement of Gene Expression with Color-Coded Probe Pairs (Geiss, et al., Nat Biotechnol. (2008) 26(3):317-25), which describes the nCounter™ Analysis System (nanoString Technologies). This system captures and counts individual hybridized nucleic acids by a molecular bar-coding technology, and is commercialized by Nanostring (on the internet at nanostring.com). See also, WO 2007/076128; and WO 2007/076129.
In Situ Hybridization
The hybridization signals for the set of probes to the target regions is detected and recorded for cells chosen for assessment of chromosomal losses and/or gains. Hybridization is detected by the presence or absence of the particular signals generated by each of the probes. Hybridization may also be performed to a reference sample with known gains and losses to assist with the analysis, for example a sample of normal cells that do not have any gains or losses. Once the copy number of target regions within each cell is determined, as assessed by the number of hybridization signals for each probe, relative chromosomal gains and/or losses may be quantified. The quantification of losses/gains can include determinations that evaluate the ratio of copy number of one locus to another on the same or a different chromosome.
Several methods can be used to determine whether a sample contains one or more of the copy number aberrations identified by the present invention. When a control sample of normal cells is employed, the relative gain or loss for each probe is determined by comparing the number of distinct probe signals in each cell to the number expected in a normal cell, i.e., where the relative copy number should be two. Non-neoplastic cells in the sample, such as keratinocytes, fibroblasts, and lymphocytes, can be used as reference normal cells. More than the normal number of probe signals is considered a gain, and fewer than the normal number is considered a loss. Alternatively, a minimum number of signals per probe per cell can be required to consider the cell abnormal (e.g., 5 or more signals). Likewise for loss, a maximum number of signals per probe can be required to consider the cell abnormal (e.g., 0 signals, or one or fewer signals). Still alternatively, a sample may have all loci elevated in copy number compared to normal cells (e.g. a tetraploid tumor) and in such cases it is of interest which loci may be more highly or less highly elevated.
The percentages of cells with at least one gain and/or loss are to be recorded for each locus. A cell is considered abnormal if at least one of the genetic aberrations identified by a probe combination of the present invention is found in that cell. A sample may be considered positive for a gain or loss if the percentage of cells with the respective gain or loss exceeds the cutoff value for any probes used in an assay. Alternatively, two or more loci with apparent aberrant copy number can be required in order to consider the cell abnormal at the desired region, with the effect of increasing specificity. Still alternatively, the total number of signals from all selected cells in the sample at each measured locus may be compared to the other measured loci in order to determine if at least one of the aberrations identified by a probe combination of the present invention is present in the sample.
aCGH
In array CGH, the probes are not labeled, but rather are immobilized at distinct locations on a substrate, as described in WO 96/17958. In this context, the probes are often referred to as the “target nucleic acids.” The sample nucleic acids are typically labeled to allow detection of hybridization complexes. The sample nucleic acids used in the hybridization may be detectably labeled prior to the hybridization reaction. Alternatively, a detectable label may be selected which binds to the hybridization product. In dual- or multi-color aCGH, the target nucleic acid array is hybridized to two or more collections of differently labeled nucleic acids, either simultaneously or serially. For example, sample nucleic acids (e.g., from oral SCC biopsy) and reference nucleic acids (e.g., from normal oral tissue) are each labeled with a separate and distinguishable label. Differences in intensity of each signal at each target nucleic acid spot can be detected as an indication of a copy number difference. Although any suitable detectable label can be employed for aCGH, fluorescent labels are typically the most convenient.
Array CGH can be carried out in single-color or dual- or multi-color mode. In single-color mode, only the sample nucleic acids are labeled and hybridized to the nucleic acid array. Copy number differences can be detected by detecting signal intensities for all of the probes on the array, normalizing those intensities by comparing them to intensities from control samples known to have normal DNA copy number at essentially all loci, and then comparing the normalized intensities for the sample nucleic acid to determine if there are loci that are at increased or decreased copy number relative to the average for the genome. To facilitate this determination, the array can include target elements for one or more loci (“control loci”) that are not expected to show copy number difference(s) in oral SCC. Control loci can be selected based on the data in
In dual- or multi-color mode, signal corresponding to each labeled collection of nucleic acids (e.g., sample nucleic acids and normal, reference nucleic acids) is detected at each target nucleic acid spot on the array. The signals at each spot can be compared, e.g., by calculating a ratio of the sample to the normal reference signal at each locus, and normalizing the signals so that the average, median, modal ratio for the entire genome is 1.0. Then, if the normalized ratio of sample nucleic acid signal to reference nucleic acid signal at a target spot significantly exceeds 1, this indicates a gain in the sample nucleic acids at the locus corresponding to the target nucleic acid spot on the array. Conversely, if the ratio of sample nucleic acid signal to reference nucleic acid signal is significantly less than 1, this indicates a loss in the sample nucleic acids at the corresponding locus.
Array-based relative copy number determinations can be obtained using a commercial service, such as, e.g., the Affymetrix-authorized SeqWright.
Amplification-Based Detection
In still another embodiment, amplification-based assays can be used to measure the relative copy numbers at loci within chromosomal regions. In such amplification-based assays, the target nucleic acids act as template(s) in amplification reaction(s) (e.g., Polymerase Chain Reaction (PCR)). In a quantitative amplification, the amount of amplification product is proportional to the amount of template in the original sample. Detailed protocols for quantitative PCR are provided in Innis et al. (1990) PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc. N.Y.). A number of commercial quantitative PCR systems are available, for example the TaqMan system from Applied Biosystems.
Other suitable amplification methods include, but are not limited to, ligase chain reaction (LCR) (see Wu and Wallace (1989) Genomics 4: 560; Landegren et al. (1988) Science 241: 1077; and Barringer et al. (1990) Gene 89: 117), multiplex ligation-dependent probe amplification (MLPA), transcription amplification (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86: 1173), self-sustained sequence replication (Guatelli et al. (1990) Proc. Nat. Acad. Sci. USA 87: 1874), dot PCR, and linker adapter PCR, etc.
Amplification is typically carried out using primers that specifically amplify one or more loci within each chromosome (e.g., chromosome 20), chromosomal region (e.g., 3q, 8p, and 8q), or chromosomal subregion (e.g., 3q24-qter, 8pter-p23.1, and 8q12-q24.2) to be queried. Detection can be carried out by any standard means, including a target-specific probe, a universal probe that binds, e.g., to a sequence introduced into all amplicons via one or both primers, or a double-stranded DNA-binding dye (such as, e.g., SYBR Green). In illustrative embodiments, padlock probes or molecular inversion probes are employed for detection.
Padlock probes (PLPs) are long (e.g., about 100 bases) linear oligonucleotides. The sequences at the 3′ and 5′ ends of the probe are complementary to adjacent sequences in the target nucleic acid. In the central, noncomplementary region of the PLP there is a “tag” sequence that can be used to identify the specific PLP. The tag sequence is flanked by universal priming sites, which allow PCR amplification of the tag. Upon hybridization to the target, the two ends of the PLP oligonucleotide are brought into close proximity and can be joined by enzymatic ligation. The resulting product is a circular probe molecule catenated to the target DNA strand. Any unligated probes (i.e., probes that did not hybridize to a target) are removed by the action of an exonuclease. Hybridization and ligation of a PLP requires that both end segments recognize the target sequence. In this manner, PLPs provide extremely specific target recognition.
The tag regions of circularized PLPs can then be amplified and resulting amplicons detected. For example, TaqMan® real-time PCR can be carried out to detect and quantify the amplicon. The presence and amount of amplicon can be correlated with the presence and quantity of target sequence in the sample. For descriptions of PLPs see, e.g., Landegren et al., 2003, Padlock and proximity probes for in situ and array-based analyses: tools for the post-genomic era, Comparative and Functional Genomics 4:525-30; Nilsson et al., 2006, Analyzing genes using closing and replicating circles Trends Biotechnol. 24:83-8; Nilsson et al., 1994, Padlock probes: circularizing oligonucleotides for localized DNA detection, Science 265:2085-8.
Molecular inversion probes (MIPs) are often employed in single nucleotide polymorphism (SNP) analysis. Like padlock probes, MIPs are single-stranded DNA molecules containing two regions complementary to regions in the target nucleic acid that flank a SNP in question. Each probe also contains universal primers' sequences separated by an endodeoxyribonuclease recognition site and a 20-nt tag sequence. During the assay the probes undergo a unimolecular rearrangement: they are (1) circularized by filling gaps with nucleotides corresponding to the SNPs in four separate allele-specific polymerization (A, C, G, and T) and ligation reactions; (2) linearized in an enzymatic reaction. As a result they become “inverted.” This step is followed by amplification. The use of MIPs is described further in Absalan F, Ronaghi M., “Molecular inversion probe assay.” Methods Mol Biol. 2007; 396:315-30; and Hardenbol P et al., “Multiplexed genotyping with sequence-tagged molecular inversion probes.” Nat Biotechnol. 2003 June; 21(6):673-8. Epub 2003 May 5.
High-Throughput DNA Sequencing
In particular embodiments, amplification methods are employed to produce amplicons suitable for high-throughput (i.e., automated) DNA sequencing. Generally, amplification methods that provide substantially uniform amplification of target nucleotide sequences are employed in preparing DNA sequencing libraries having good coverage. In the context of automated DNA sequencing, the term “coverage” refers to the number of times the sequence is measured upon sequencing. The counts obtained are typically normalized relative to a reference sample or samples to determine relative copy number. Thus, upon performing automated sequencing of a plurality of target amplicons, the normalized number of times the sequence is measured reflects the number of target amplicons including that sequence, which, in turn, reflects the number of copies of the target sequence in the sample DNA.
Amplification for sequencing may involve emulsion PCR isolates in which individual DNA molecules along with primer-coated beads are present in aqueous droplets within an oil phase. Polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods by Marguilis et al. (commercialized by 454 Life Sciences), Shendure and Porreca et al. (also known as “Polony sequencing”) and SOLiD sequencing, (developed by Agencourt, now Applied Biosystems). Another method for in vitro clonal amplification for sequencing is bridge PCR, where fragments are amplified upon primers attached to a solid surface, as used in the Illumina Genome Analyzer. Some sequencing methods do not require amplification, for example the single-molecule method developed by the Quake laboratory (later commercialized by Helicos). This method uses bright fluorophores and laser excitation to detect pyrosequencing events from individual DNA molecules fixed to a surface. Pacific Biosciences has also developed a single molecule sequencing approach that does not require amplification.
After in vitro clonal amplification (if necessary), DNA molecules that are physically bound to a surface are sequenced. Sequencing by synthesis, like dye-termination electrophoretic sequencing, uses a DNA polymerase to determine the base sequence. Reversible terminator methods (used by Illumina and Helicos) use reversible versions of dye-terminators, adding one nucleotide at a time, and detect fluorescence at each position in real time, by repeated removal of the blocking group to allow polymerization of another nucleotide. Pyrosequencing (used by 454) also uses DNA polymerization, adding one nucleotide species at a time and detecting and quantifying the number of nucleotides added to a given location through the light emitted by the release of attached pyrophosphates.
Pacific Biosciences Single Molecule Real Time (SMRT™) sequencing relies on the processivity of DNA polymerase to sequence single molecules and uses phospholinked nucleotides, each type labeled with a different colored fluorophore. As the nucleotides are incorporated into a complementary DNA strand, each is held by the DNA polymerase within a detection volume for a greater length of time than it takes a nucleotide to diffuse in and out of that detection volume. The DNA polymerase then cleaves the bond that previously held the fluorophore in place and the dye diffuses out of the detection volume so that fluorescence signal returns to background. The process repeats as polymerization proceeds.
Sequencing by ligation uses a DNA ligase to determine the target sequence. Used in the Polony method and in the SOLiD technology, this method employs a pool of all possible oligonucleotides of a fixed length, labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position.
In various embodiments, affinity capture or other enrichment procedures can be used to enrich sequences from particular parts of the genome for subsequent sequencing. Such enrichment methods are known in the art.
The invention includes combinations of probes and/or primers, as described herein, that can be used to subtype oral SCC or oral epithelial dysplasia or to detect metastatic oral SCC in a lymph node, as well as kits for use in diagnostic, research, and prognostic applications. Kits include probe/primer combinations and can also include reagents such as buffers and the like. The kits may include instructional materials containing directions (i.e., protocols) for the practice of the methods of this invention. While the instructional materials typically include written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media include, but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the like. The kit may include addresses to internet sites that provide such instructional materials.
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims.
In addition, all other publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
Clinically evident precancerous oral lesions preceding development of oral squamous cell carcinomas (SCC) include oral epithelial dysplasia of varying grades (mild, moderate, severe) (5). Transformation to cancer occurs in 16% of mild and 55% of moderate/severe dysplasia and is considered to occur by stepwise acquisition of genetic and/or epigenetic alterations (6). The data in this study show that +3q24-qter, −8pter-p23.1, +8q12-q24.2 and +20 occur at ≧20% frequency in oral dysplasia cases with no known association with oral cancer. Moreover, 75-80% of all dysplasia and SCC cases harbor one or more of these copy number aberrations, with additional recurrent aberrant regions occurring in SCC. On the other hand, 20-25% of dysplasia and oral SCC cases lack the copy number aberrations +3q, −8p, +8q and +20, and have few or no other copy number alterations. Thus, aberrations involving 3q, 8p, 8q and chromosome 20 appear to be early events that identify a major subgroup of oral cancer (3q8pq20 subtype) that develops with chromosomal instability, and distinguishes it from a smaller group of chromosomally stable SCC (non-3q8pq20). Importantly, the two subtypes differ in clinical behavior, with the non-3q8pq20 tumors being associated with a low risk for metastasis. Presence of one or more of the aberrations, +3q, −8p, +8q and +20, is therefore a biomarker for oral SCC metastasis. In addition, while increased numbers of genomic alterations can be harbingers of progression to cancer, lesions lacking copy number changes cannot be considered benign as they are potential precursors to the 20-25% of oral SCC that lack recurrent copy number alterations.
It is generally accepted that oral SCC develops via accumulation of genetic and epigenetic changes in a multi-step process, with aberrations being frequently recognized in premalignant lesions or in histologically normal tissue (6). On the one hand, several independent reports support loss of heterozygosity (LOH) at 9p21 and 3p as early events in development of dysplasia, with LOH at additional loci associated with transformation to cancer (7). On the other hand, studies purporting to show that aneuploidy alone was the best predictor of progression to cancer were subsequently discovered to have been founded on fabricated data (8). Therefore, to clarify the role of genomic aberrations in oral cancer progression and metastasis, array comparative genomic hybridization (CGH) was carried out to determine the genome-wide spectrum of copy number gains and losses in 39 oral dysplasia samples, 29 with no known association with cancer and 10 that either subsequently progressed to cancer or appeared at the site of a previous cancer.
Patients and Tissue Samples.
We obtained formalin fixed paraffin embedded dysplasia and SCC tissue specimens from oral cavity sites (tongue, gingiva, floor of mouth, retromolar trigone, buccal mucosa and lip) and associated clinical data through the UCSF Oral Cancer Tissue Bank and Cancer Registry. Patient consent was obtained for use of all specimens. For cohort#2, we considered oral cavity SCC cases treated at the University of California San Francisco Medical Center between 1998-2005 to be eligible for inclusion if patients were older than 21 years and they did not receive radiation or chemotherapy prior to tumor resection. We considered cases to be node positive if the histopathologic nodal status was positive at the time of surgical treatment or metastasis was identified during the five year follow up period, whereas we considered patients to be node negative if pathologic nodal status was negative at the time of surgical resection and no nodal involvement occurred during a five year follow up period. From the 2500 cases in the bank, we were able to identify and accession tissue blocks for 64 cases for which the required clinical information was available and there was sufficient tumor material (i.e. tumors≧1.5 cm in diameter) for analysis. Prior to extraction of nucleic acids from dysplasia or SCC specimens, we stained the first and last sections with hematoxylin and eosin. We examined these sections to confirm the diagnosis and grading of dysplasia, which was done by one pathologist (RCKJ), and to estimate the normal cell content of the regions of dysplasia and SCC selected for dissection, which varied from 60-90% epithelial cells. Patient samples and characteristics are provided in Tables 1 and 2.
TP53 Sequencing.
We amplified exons 5-8 of TP53 from genomic DNA and carried out cycle sequencing, as described previously (Snijders et al. 2005).
Array CGH.
We dissected regions of dysplasia or tumor from 15 consecutive 10 μm formalin fixed paraffin embedded tissue sections from routine surgical excisions. For the analysis of cohort#2, we also dissected regions of normal tissue, e.g. muscle from the same patient blocks. We extracted DNA and carried out copy number measurements on arrays of 2464 BAC clones printed in triplicate as described previously (Snijders et al. 2005). The array datasets are available at NCBI GEO (submission in progress).
Array Data Pre-Processing.
We studied four datasets. We obtained two datasets from previous publications; SCC cohort#1 from our own published work (Snijders et al. 2005) and an independent dataset from the Netherlands (Smeets et al. 2009). Here, we describe the analysis of the two new datasets. The oral dysplasia dataset comprises 39 samples hybridized to three different print versions of the UCSF BAC array (Snijders et al. 2001) (HumArray2.0, 3.0, and 3.2), which differ slightly in clone content. The oral SCC dataset (cohort#2) comprises 63 tumor samples, with accompanying paired normal samples from the same patient for 61 of cases. All of the tumor samples were hybridized to the HumArray3.2 platform. We used UCSF SPOT (Jain et al. 2002) for array image analysis, and after quality filtering on spots and targets, we applied a “SpotCorrection” algorithm for removing systematic geometric and GC content effects. The algorithm employs an iterative scheme to estimate smoothly varying spatial artifacts in log 2 ratios across the array while retaining ‘true’ (genomically coherent) signals. We normalized GC content using loess and then performed replicate spot averaging and clone filtering as previously described (Snijders et al. 2003). We estimated the experimental variability of each CGH profile (sd) by taking the median of the absolute deviations (MAD) of the measurements on clones with the same copy number in that profile, and if replicate hybridizations were available for a case, we retained the one with the lower MAD.
For a subset of the oral dysplasia samples hybridized on the HumArray 2.0 platform, we observed a “print batch effect” (PBE), which manifested as systematic enhanced noise across multiple samples. To correct this effect, we clustered the data from these samples, which revealed two different PBE populations. For each of these, we calculated a PBE template as the median log 2 ratio per probe (BAC clone) across samples in each population. After appropriate scaling (equal to the dot product amplitude of the tumor profile on the template), we subtracted the PBE template from the tumor profile.
For tumor profiles with paired normal hybridization profiles, we applied noise reduction using the normal sample as template. The success of this strategy implies the existence of a shared sample specific effect on the log 2 ratios across hybridizations. The magnitude of this effect can be estimated using the derivative log ratio spread (DLRS) (Chen et al. 2008). The scaling factor between a tumor profile and its normal template was the ratio of their respective DLRS's. For the two tumor profiles without a paired normal, we employed the per-probe median of the normal profiles as a normal template, with DLRS scaling as above.
Statistical Methods.
All p-values less than 0.05 were considered significant, unless there was a multiple comparisons adjustment, in which case a q-value less than 0.05 was considered significant. Calculations were performed using the R language (Ihaka and Gentleman 1996).
Copy Number Analysis.
We mapped the dysplasia and SCC data to the May 4 freeze of the human genome sequence (hg17) and separately processed each dataset using circular binary segmentation (CBS) (Olshen et al. 2004) as implemented in the DNAcopy package that is part of Bioconductor (Gentleman et al. 2004). We used the scaled median absolute deviation (MAD) of the difference between the observed and segmented values to estimate the sample-specific experimental variation. For each sample, we declared a segment to be gained or lost if the average log 2 ratio was at least two times the sample MAD away from the median segmented value. We defined high level amplifications, as we have described previously (Fridlyand et al. 2006b), by considering the width of the segment to which a clone belonged and the minimum difference between the segment value of the clone and the segment means of the neighboring segments. We declared a clone amplified if it belonged to the segment spanning less than 20 Mb and the minimum difference was greater than exp(−x3) where x is the difference in segment means.
We calculated the numbers and types of genomic alterations as described previously (Fridlyand et al. 2006b). Briefly, we defined the total number of copy number transitions (break points) as the total number of segments minus the number of chromosomes. The number of whole arm changes (centromeric copy number transitions), we defined as occurring when the segment end was assigned at the most proximal clone on the p-arm. We assigned whole chromosome changes to chromosomes without identified breakpoints and when the chromosomal segment mapped to the gain or loss level. Finally, we scored an autosomal chromosome arm as amplified if it contained at least one amplified clone.
To measure the amount of the genome altered, we assigned each clone a genomic distance equal to the sum of one half the distance between its center and that of its neighbouring clones. We summed the genomic distances of clones that are gained or lost and the resulting value represents the fraction of the genome altered (FGA). To calculate only the fraction of the genome gained or lost, we considered only the genomic distances of clones that are gained or lost, respectively.
Hierarchical Clustering of Tumor Profiles.
We grouped our samples and generated heatmaps by unsupervised clustering of samples on trichotomous gain/loss/normal data for the autosomes. We used Euclidean distance as the distance metric and Ward's linkage as the agglomeration method.
Determination of Recurrent Regions of Aberration.
We defined recurrent common regions of aberration as contiguous clones for which the frequency of gain (or loss) occurred at greater than or equal to a specified frequency in a cohort. Within each recurrent region, we also defined recurrent focal regions as any local maxima in the frequency. In a new sample, we considered a previously specified region to be “gained” if more clones were gained than lost, “lost” if more clones were lost than gained, and “normal” if there were no gains or losses. Counts of aberrant regions were compared using the Wilcoxon rank sum test.
To identify samples as 3q8pq20, we defined recurrent common regions using a frequency of >20% in the dysplasia cohort with no known association with cancer. We declared samples to be 3q8pq20 if one or more of the common recurrent gains on 3q, 8q or 20 (encompassing a focal region on 20p including JAG1) or loss of 8p was present. Proportions of 3q8pq20 subjects were compared between cohorts using Fisher's exact test.
Evaluation of Significant Differences in Recurrent Aberrations in Dysplasia and SCC.
We compared dysplasias and SCC cohort#1 for differences in aberrations of chromosome arms or recurrent regions of aberration. For the region-wise comparison, we used a frequency cutoff of 20% in SCC cohort#2. Differences were evaluated using Fisher's exact test (Mehta 1986) utilizing the dichotomized indicator gained (or lost)/not gained (or not lost), and the p-values were adjusted for multiple testing by controlling the false discovery rate (FDR) (Benjamini 1995).
Evaluation of Differences Between 3q8pq20 and Non-3q8pq20 Tumors in SCC Cohorts #1 and #2.
Similar to the above analysis for regional differences, we identified differences in aberration frequencies in individual clones between 3q8pq20 and non-3q8pq20 cases in SCC cohort#1 and #2 utilizing Fisher's exact test. Differences in instability characteristics in SCC cohort#2 were evaluated using the Wilcoxon rank sum test.
Copy Number and Methylation Analysis.
Copy number and methylation data for a head and neck cancer data set comprised of 15 oral cavity and 4 oropharyngeal tumors (Poage et al. 2010) were accessioned from NCBI GEO (GSE20939 and GSE20742). Segmentation of the copy number data (Olshen et al. 2004) revealed low amplitude copy number changes, suggestive of normal cell contamination, requiring assignment of 3q8pq20 status to the oral cavity cases by visual inspection of the copy number profiles. We further distinguished whether 3q8pq20 cases had high or lower levels of copy number alterations.
Methylation data consisted of beta values on 1413 probes for 26 samples (15 tumors and 11 controls). The following nonlinear transformation was applied to the beta values,
s=sqrt(beta)−sqrt(1−beta).
This transformation increases the Gaussian character of the data and has the effect of reducing the number of false positives. The transformed data were then quantile normalized across samples. We used the top 10% most variable probes (142 probes, Table 9) for hierarchical clustering, which was performed using Euclidean distance and complete linkage. Probes were tested for differential methylation between tumor types using the limma package, for the following comparisons: highly unstable 3q8pq20 vs. the rest; all tumors vs. the normal cases; and 3q8pq20 tumors vs. non-3q8pq20 tumors plus normal cases. The probes for each comparison were filtered on absolute mean difference in methylation level (>0.05) and adjusted p-value (<0.05, FDR) (Benjamini 1995). This analysis yielded 49, 18 and 15 probes for the above three comparisons, respectively (Table 10). To generate the list of probes differentially methylated only in the highly unstable 3q8pq20 tumors, we removed probes from the highly unstable 3q8pq20 vs. the rest list if they were included in any of the other comparisons leaving 37 probes (Table 10).
We used EGAN (Paquette and Tokuyasu 2010) to investigate enrichments in the probes differentially methylated in the highly unstable 3q8pq20 tumors. For the analysis, we generated a background gene list from the GPL9183 annotations file for the Illumina array (NCBI GEO, GSE20939 and GSE20742), and we used the probe with the minimum p-value, if a gene were represented by multiple probes (32 genes).
Associations with Clinical Characteristics.
We compared patient and tumor characteristics with 3q8pq20 status, cervical node status and genome instability measures using Fisher's exact test. We estimated survival curves by nodal status using the Kaplan-Meier method, and we tested for differential survival using the log-rank test.
We assembled a cohort of 39 oral dysplasia cases comprised of lesional biopsies from 29 cases with no known association with cancer and 10 from patients who subsequently developed cancer at the site of the dysplasia or the dysplasia appeared at the site of a previous cancer (Table 1 and Table 2). We compared these profiles to those of oral SCCs from two independent cohorts, cohort#1 (89 cases), which we had previously profiled (Snijders, et al., Oncogene (2005); 24: 4232-42) and cohort#2 with 63 cases with five-year clinical follow-up (Table 1 and Table 3).
Considering the dysplasia cases with no known association with cancer, we found four regions of low level aberration (e.g. single copy gain and loss) that were each present in >20% of cases (
We noted that gains of 3q, 8q, 20 and loss of 8p were frequent aberrations in both oral SCC cohorts (
To confirm that the frequencies of the two subtypes were not simply a characteristic of oral cancers from Northern California, we accessioned an independent oral SCC array CGH dataset from the Netherlands (Smeets et al. 2009) comprised of 29 cases. We did not find a significant difference in the proportion of 3q8pq20 and non-3q8pq20 subtypes (75% and 25%, respectively; p=0.76) among the 28 cases with copy number data of sufficient quality. Moreover, since these 28 cases had tested negative for human papillomavirus (HPV), these observations allow us to rule out HPV infection, which is a common etiologic agent in oropharyngeal cancers, but not oral cavity cancers (Herrero et al. 2003), as an underlying determinant of subtype. Thus, 3q8pq20 and non-3q8pq20 subtypes and their relative proportions appear to be a universal feature of oral SCC cases from western countries.
Although dysplasia and oral SCC share recurrent aberrations involving 3q, 8p, 8q and chromosome 20, it is clear from
Copy Number Aberrations are More Frequent in the 3q8pq20 Subtype
Hierarchical clustering of the cases in the two oral SCC cohorts revealed that recurrent low level gains and losses were not uniformly distributed (
The 3q8pq20 Tumors with High Levels of Chromosomal Instability are Differentially Methylated
The lack of chromosome level instability in non-3q8pq20 tumors suggests that development of these tumors could be associated with other, copy number neutral, mechanisms, such as microsatellite instability or epigenetic alterations. Microsatellite instability is not common in oral cancer (Shaw et al. 2008), whereas genome-wide alterations in methylation patterns are observed (Poage et al. 2010). Therefore, to investigate whether 3q8pq20 and non-3q8pq20 oral SCC subtypes differed in methylation patterns, we accessioned a published dataset for a head and neck cancer patient cohort comprised of 15 oral cavity and 4 oropharyngeal tumors (Poage et al. 2010) for which both copy number and methylation measurements were available (NCBI GEO accession GSE20939). We assigned 3q8pq20 status to the oral cavity cases (Table 8). Hierarchical clustering using the top 10% most variable methylation probes (142 probes, Table 9) revealed that differential methylation was associated with the cases with the greater number of copy number alterations (high 3q8pq20), as noted previously (Poage et al. 2010). The highly unstable 3q8pq20 cases clustered separately from the low genomic instability 3q8pq20, non-3q8pq20 and normal samples (
In addition to the low level gains and losses discussed above, we observed that dysplasia genomes harbored amplifications, defined as focal regions of higher level increased copy number. Previously, we reported that oral SCC characteristically amplify narrow regions of the genome (<3 Mb) and identified 18 such recurrent amplicons (Snijders et al. 2005). In the 29 dysplasia cases with no known association with cancer, we found two of these amplicons at 11q13 (CCND1, PAK1) and 20p12.2 (JAG1) to be present, as well as amplification at 2q11.2 in two dysplasia cases and two non-recurrent amplicons at 20q13.33 and 21q21.3 (
aFrequency reported in oral SCC cohort#1 by Snijders et al. (Snijders et al. 2005)
bAlthough the 2q11.2 amplicon had not been observed previously in SCC cohort#1 as a recurrent amplicon (Snijders et al. 2005), we had reported it in an oral SCC cell line (Hermsen et al. 2005) and it has recently been reported by others in dysplasia (Garnis et al. 2009).
cThe region is gained in ≧15% of SCC cases.
Considered together the distribution of copy number aberrations in dysplasia and SCC suggest that there are two distinct routes to oral cancer, one associated with greater genome instability and acquisition of +3q, −8p, +8q and/or +20 in pre-malignant stages and the other lacking chromosomal level instability detectable by CGH. Potential differences in developmental pathways leading to oral cancer are likely to impact clinical behavior. Indeed, we observed a highly significant association of 3q8pq20 status with pathologic cervical (neck) lymph node status (odds ratio 11.5 (CI 1.5, 521.8); Fisher's exact test p=0.006), i.e. neck metastasis (N+) was present in 46% (22/48) of 3q8pq20 tumors and in only 7% (1 of 15) of non-3q8pq20 tumors (Table 13 and Table 14).
The presence of metastases to the cervical lymph nodes is the major determinant of survival for oral SCC patients (O'Brien et al. 1986; Whitehurst et al. 1977). The differential risk for metastasis in the 3q8pq20 and non-3q8pq20 oral SCC subtypes indicates that chromosomal aberrations +3q, −8p, +8q and +20 provide a potential biomarker to identify patients with no or low risk of metastasis. To confirm this observation, we investigated the association of nodal status and 3q8pq20 status in the independent cohort of oral SCC patients from the Netherlands (Smeets et al. 2009) for which copy number and pathologic node status were available (VUMC, Table 13). In this cohort, we also found the non-3q8pq20 subtype to be at low risk for metastasis (Fisher's exact test p=0.036). We note in particular that the sensitivity and negative predictive value for metastasis (i.e. ability to predict N0 cases at the time of biopsy) were 96% and 93%, respectively in SCC cohort#2 and both were 100% in the Dutch cohort (Table 13). We also observed a modest association with age in cohort#2, non-3q8pq20 tumors were more frequent in patients older than 65 years (p=0.018, Table 14), but not in the Dutch cohort.
Since the 3q8pq20 and non-3q8pq20 subtypes also differ in genomic instability, we considered association of genome instability measures with clinical characteristics in cohort#2. On the one hand, although genome instability is commonly reported to be correlated with measures of poor prognosis, we found no association of any genome instability measures with recurrence free survival, disease free survival or overall survival in cohort#2 (log rank test, data not shown). On the other hand, we observed significant association of nodal status with increased numbers of whole chromosome copy number changes (p=0.046), fraction of the genome gained (FGG, p=0.004) and fraction of the genome altered (FGA, p=0.024), suggesting that these measures may also serve as biomarkers of nodal status (Table 15). We did not, however, find a clear cutpoint for prediction of nodal status by either measure (
In addition, we observed previously described associations with positive nodal status (O'Brien et al. 1986; Whitehurst et al. 1977), including increased tumor size (p=0.018), tumor thickness (p=0.010) and reduced survival (Table 16 and
By comparison of recurrent copy number alterations in oral pre-cancers and cancers, we have obtained evidence that there are at least two pathways of oral cancer development. One subtype acquires one or more of the aberrations +3q, −8p, +8q and/or +20 in dysplastic lesions, whereas recurrent copy number aberrations are absent from the other subtype. The 3q8pq20 subtype further subdivides according to levels of genome instability and alterations in methylation profiles. Notably, the two subtypes differ in clinical behavior, the non-3q8pq20 SCCs being associated with a very low risk for cervical node metastasis. Other lines of evidence supporting diverse routes to oral cancer (Hunter et al. 2005; Hunter et al. 2006; Jin et al. 2006; Noutomi et al. 2006) have highlighted differences in genome instability, gene expression profiles and possibly cell of origin as distinguishing features.
Our observations raise questions as to mechanism—the identity of the genes in these regions (3q, 8p, 8q and 20) and the functional consequences of their gain or loss that provide a growth advantage when at altered copy number early on in the pre-cancers (dysplasia). Identifying the genes from the copy number data alone is challenging, as the involved regions are large. Losses involving 8p and gains involving 3q, 8q and 20q occur frequently in cancers. Some insight into the genes that may be playing a role in de-regulating growth in pre-cancerous lesions may be obtained by considering candidate oncogenes and tumor suppressors that have been suggested for these regions based on finding that they are amplified or deleted in tumors. It is important to bear in mind, however, that candidate oncogenes mapping to regions of low level gains in pre-cancers may function differently than they do when at highly elevated copy number in tumors. Moreover, the ensemble of genes within these large regions (i.e. the balance of oncogenic and tumor suppressor functions) may together promote the pre-neoplastic changes. Nevertheless, taking this approach, JAG1 appears to be a likely candidate on chromosome 20p, as we found it to be amplified in dysplasia (Table 12) as well as cancer (Snijders et al. 2005). We also observed amplification at 20q11 in SCC cohort#1, suggesting BCL2L1, DNMT3B, E2F1, NCOA6, TGIF2 and ITCH as candidate oncogenes that could be contributing to the early de-regulation of growth. Similarly, candidate oncogenes on 8q identified in oral SCC include YWHAZ (Lin et al. 2009), MYC, PVT1 and associated miRNAs. Analysis of recurrent regions of amplification on 3q in our oral SCC cohorts found four regions, suggesting TM4SF1, WWTR1, RNF13, GPR87 (region 1), EV11, TERC, PRKCI, SKIL, EIF5A2, PLD1, GHSR, ECT2 (region 2), PIK3CA, SOX2, DCUN1D1 (region 3), TP63 and CLDN1 (region 4) as candidate oncogenes (
Treatment for oral cancer is almost always surgical. Identification of patients with node-positive (N+) necks is the most important question to be accurately answered prior to surgical resection of the tumor, as well as for post-surgical treatment and follow-up (Cheng and Schmidt 2008). Typically, patients are assessed prior to surgery for lymph node metastases by palpation of the lymph nodes in the neck and by imaging (CT, MRI, PET scan). For patients with clinically node negative necks, treatment options include a “wait and see” approach or elective neck dissection (i.e. performing a neck dissection when there is no clinical or radiographic evidence of neck metastasis) if the chance of metastasis is >20% based on current risk assessment capability (Cheng et al. 2008). The 20% cutoff was established by mathematical modeling of the decisions and outcomes of management of the N0 neck to determine the threshold at which the benefits outweigh the costs of prophylactically treating the neck (Weiss et al. 1994). Currently, tumor thickness is considered the best predictor of metastasis. Since it is difficult to assess this parameter from the incisional biopsy prior to surgery (Cheng et al. 2008), the American Joint Commission on Cancer (AJCC) TNM staging protocol, which is based on surface diameter of the tumor (Byers et al. 1998) is often used to assess likelihood of metastasis. It is common in clinical practice to not recommend neck dissections if tumors are <2 cm in size (stage T1) and thickness <3 mm. Occult metastatic rates for oral SCC, however, are high and range from 20-45% for T1 tongue SCCs (Cheng et al. 2008). Thus, the failure to find evidence of metastasis on clinical exam provides little confidence that the patient does not require removal of the cervical lymph nodes. For this reason, in many medical centers, patients are routinely offered elective neck dissection (i.e. performing a neck dissection when there is no clinical or radiographic evidence of neck metastasis).
All patients in cohort#2 received neck dissections, as this treatment was a criterion for inclusion in the study. With the exception of three tumors, all were ≧3 mm in thickness. Tumor size of the 14 node negative non-3q8pq20 cases in this cohort ranged from 1.0-6.4 cm and thickness (recorded for seven cases) ranged from 0.2-1.3 cm (Table 3). None of the node negative non-3q8pq20 tumors would have met the criteria of stage T1 and thickness <3 mm for not recommending a neck dissection. In addition, two of the 14 node negative non-3q8pq20 cases were diagnosed as clinically node positive, but subsequently found to be node negative by pathology. Assessment of 3q8pq20 status prior to surgery would have added prognostic value and could have spared these 14 patients from unnecessary surgery. Moreover, our initial findings—non-3q8pq20 tumors have less than a 7% chance of metastasis—is well below the current 20% risk threshold, further supporting the potential utility of assessing 3q8pq20 status at the time of diagnostic biopsy to substantially improve clinical decisions regarding elective neck dissection.
We also find that FGG and FGA are correlated with risk for metastasis, although we did not find a clear cutpoint for either measure. Using cutpoints of 0.065 and 0.095 for FGG and FGA, respectively, we correctly identified more of the N0 cases than we did based on 3q8pq20 status; however more N+ cases are mistakenly called N0, which in the clinic may outweigh the benefits of detecting more N0 patients due to the extremely poor survival of patients who undergo surgical salvage for neck metastasis. Larger studies will be required to determine the utility of FGG, FGA and non-3q8pq20 subtype as biomarkers for cervical node status. For application in the clinic, however, it is likely that evaluation of 3q8pq20 (four loci) will have an advantage, since it would be more amenable to measurement using less complex biomarker assays (e.g. PCR) than would be assessment of genome-wide copy number alterations to determine FGG or FGA. Eliminating unnecessary neck dissections would reduce surgical risks, patient morbidity, lengthy surgeries (typically 10 hours) and hospitalization time.
There are a growing number of tumor types for which subtypes have been identified that lack copy number instability (Barretina et al. 2010; Fridlyand et al. 2006a; Smeets et al. 2009; Taylor et al. 2010). Better prognosis is often associated with these subtypes. In oral cancer, the non-3q8pq20 subtype is clearly a member of this group as there is low genomic instability and a low risk of metastasis. The driving force for these tumors remains obscure. The non-3q8pq20 oral tumors do not appear to have distinguishing methylation profiles or microsatellite instability, leaving open the possibility that there are underlying copy neutral chromosomal rearrangements or extensive mutations in oncogenes and tumor suppressors in this subtype. On the other hand, these tumors may be promoted by extrinsic factors that modify growth of epithelial cells, including inflammation and aberrant behavior of neighboring cells (Arwert et al. 2010). Infection with microorganisms is another candidate; bacteria have been reported in association with certain cancers (Fassi Fehri et al. 2011; Hooper et al. 2006), and also to modify growth signaling pathways in epithelial cells (Fassi Fehri et al. 2011; Hooper et al. 2009).
In summary, copy number analysis of oral cancers and pre-cancers has revealed two subtypes, 3q8pq20 and non-3q8pq20, distinguished by acquisition of specific copy number alterations in the early pre-cancerous lesions. The two subtypes are likely to develop by different pathways that result in tumors differing in their clinical behavior, namely risk for metastasis. In addition, we note that although much attention has focused on regions of genomic imbalance as biomarkers of progression because they are present at greater frequency in oral SCCs compared to pre-cancers (Bremmer et al. 2008), such markers, at best, can only report on the likelihood of progression of the 3q8pq20 subtype. They cannot provide information on progression of chromosomally stable non-3q8pq20 lesions.
Brush biopsy sample analyses have employed DNA isolated from buccal swabs for PCR based assays (Garcia-Closas, et al., Cancer Epidemiol Biomarkers Prev, (2001) 10(6):687-96; and Mao, et al., Proc Natl Acad Sci USA, (1994) 91(21):9871-5) or cytological analyses using FISH on nuclei from cells smeared directly on glass slides and from fixed cell suspensions (
We have established that array CGH can be carried out with DNA isolated from oral brush biopsy samples. Our array CGH hybridizations typically use 0.5 μg of genomic DNA, although we have carried out this analysis with as little as 0.003 μg of DNA, and whole genome amplification methods currently allow analysis of only a few cells. Data in the literature indicate that 6 to 416 μg of DNA can be obtained by brush biopsy (London, et al, Cancer Epidemiol Biomarkers Prev (2001) 10:1227-30). Our experience using any of the brushes/swabs is consistent with this report. For example, two oral surgeons independently brush biopsied a 1×1 cm area of buccal mucosa with the foam brushes, yielding 1-1.3 μg of DNA following standard nucleic acid isolation procedures. Cytology of the brushing indicated that 100% of the harvested cells were epithelial.
Most recently, the lesions of two oral cancer patients, who were undergoing curative surgery for their cancers were swabbed using the Isohelix swab. The Isohelix DSK DNA isolation/stabilization buffer and proteinase K were added to the tube with the swab according to the manufacturer's instructions and shipped to UCSF. Using our standard laboratory protocol, we recovered 7.3 μg and 4.5 μg of DNA, respectively from the two samples that were suitable for array CGH.
Positions of STS markers are determined using both full sequences and primer information. Full sequences are aligned using blat, while is PCR (Jim Kent) and ePCR are used to find locations using primer information. Both sets of placements are combined to give final positions. In nearly all cases, full sequence and primer-based locations are in agreement, but in cases of disagreement, full sequence positions are used. Sequence and primer information for the markers were obtained from the primary sites for each of the maps and from UniSTS.
AGGTCCTCATAGTGGAGACG
ATGCCTAGGAGGTAAACTCC
GGTGAAAAAGATAGGCTCAA
CCACTGTTAAATGCTATTAGCC
CCCAAAGTCATGAAATGAGA
ACAACATACCTGTTAGGAGGTG
ATTTATTGTTTGCTTTGTGCCA
CAACAATATCCTTATTTTAGGTGCC
Positions of Flanking BACs that have been Sequenced
This is the only location found for RP11-72E23 (sts AFM210VE7)
Chromosome: chr3
BAC end sequences are placed on the assembled sequence using Jim Kent's blat program
Chromosome: chr8
Chromosome: chr8
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All sequence references, publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
This application claims the benefit of U.S. Provisional Application No. 61/400,813, filed on Aug. 2, 2010, the entire disclosure of which is hereby incorporated herein by reference for all purposes.
This invention was made with government support under grant nos. R01CA90421, R01CA113833, R01CA118323, R01CA131286, and R33CA94407 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/01377 | 8/2/2011 | WO | 00 | 5/13/2013 |
Number | Date | Country | |
---|---|---|---|
61400813 | Aug 2010 | US |