The present invention relates to isolated fused gene implicated in tumour, in particular breast tumour. The invention also provides a kit for the detection of the fused genes for the diagnosis and/or prognosis of tumour in a subject.
Chromosomal aberrations including deletions, duplications, inversions, insertions and translocations are the characteristic feature of many cancer types. Primary focus of cancer genome analysis is to identify genes that are perturbed and play a role in cancer development. Many deregulated and fusion genes have been identified by cloning breakpoint junctions of chromosome translocations in hematological malignancies and soft tissue sarcomas. Chromosome translocations can cause deregulation of genes at the breakpoints which result in neoplastic transformation. There are two major molecular consequences associated with chromosome translocations; first, the promoter and/or enhancer element of a gene is placed near an oncogene result in over expression of the oncogene. Secondly, formation of a fusion gene produced by breakage and joining within introns of two genes result in expression of a fusion protein.
Among the different types of chromosome aberrations, recurrent translocations are prevalent and well characterized in hematological malignancies. In many solid tumor cancers, despite the presence of many structural aberrations, mostly unbalanced translocations, tumor specific recurrent translocations are difficult to characterize due to several technical limitations with the available technologies. A recently cloned recurrent fusion gene in prostate cancer, using bioinformatics analysis of gene expression microarray data (Tomlins et al., 2005), set a new paradigm shift towards understanding the molecular complexity in solid tumors.
The most common problem in solid tumor cancer genome analysis is the failure to characterize unbalanced copy number changes and complex rearrangements. Gene expression micro array and low-resolution copy number analysis methods do not provide information on genomic rearrangements. Conventional cytogenetic karyotyping analysis on hematological malignancies and solid tumors identified 52,172 (http://cgap.nci.nih.gov/Chromosomes/Mitelman) abnormal karyotypes as on May 16, 2007. Complete molecular characterization of various chromosome rearrangements resulted in the identification of more than 358 fusion genes (Mitelman et al., 2007). Specificity of chromosome translocations lead to sub classification of tumors solely based on chromosome aberrations. Until date, about 500 such tumor specific translocations are identified. In spite of the higher incidence of cancer death due to solid tumor cancer (80%) when compared with hematological malignancies (10%) the proportion of available cytogenetics information, appear to be more in hematological malignancies. The cytogenetic changes in hematological malignancies are very few even in advanced stage cancers and the type of chromosome changes are specific to particular histological type. Chromosome aberrations in solid tumors are highly complex even at the early stage or at diagnosis making it impossible for the correct identification of all abnormal chromosomes. Among the various changes the distinction between tumor associated primary abnormality and progression associated changes are not possible. Additional complexities are due to clonal heterogeneity, which is present in less than 5% of hematological cancers but very common in solid tumors.
Among many types of solid tumors, breast cancer is one of the tumor types for which the chromosome abnormalities are not well studied. According to recent estimates from American Cancer Society; about 212,920 women will be diagnosed and 40,970 are predicted to have died of breast cancer in the year 2006 (ACS, 2006). Current understanding on the genetic basis of breast cancer is limited to mutated and amplified genes in a proportion of breast cancer patients. Breast cancer genome is characterized by the presence of highly unbalanced aneuploial karyotype with complex structural rearrangements and numerical aberrations. It is evident from the literature review that identification of recurrent aberrations is nearly impossible with currently available cytogenetic and molecular methods.
Although cloning of fusions genes by molecular characterization of chromosome translocation identified by G-band karyotyping has been a successful approach in hematological malignancies and soft tissue sarcomas, the highly complex genomic rearrangements and identification of recurrent chromosome translocations by G-band karyotyping is often difficult due to poor chromosome morphology and clonal heterogeneity in solid tumors. As evident from the MCF7 data more than 60% of copy number boundaries are located within known genes that can be directly selected for further validation.
To date, no recurrent translocation producing fusion genes have been identified in breast cancer and the current invention provides a new approach to identify fusion genes based on the analysis of unbalanced copy number changes
The present invention addresses the problems above, and in particular to provides new and/or improved use of the CGH method for the identification of copy number transition (CNT) regions comprising the fused genes therein. The invention also provides the use of novel fused genes identified in the invention as biomarkers in the diagnosis of solid tumours.
According to one aspect of the current invention, there is provided an isolated fused gene comprising at least one first gene and/or fragment thereof fused to at least one second gene and/or fragment thereof. The at least one first and/or the second gene may independently, be selected from the group of genes consisting of: RCC2, CENPF, ARFGEF2, SULF2, MTAP, ATXN7, BCAS3, RPS6 KB1, TMEM49, EAP30, a gene having the nucleotide sequence SEQ ID NO:1, and a gene having the nucleic acid SEQ ID NO:2, or a fragment thereof. The fusion of the genes may be by genomic translocation, insertion, inversion, amplification and/or deletion. The fused gene may be selected from the group of fused genes: RCC2/CENPF, ARFGEF2/SULF2, MTAP/a gene comprising the nucleotide sequence SEQ ID NO:2, ATXN7/a gene comprising the nucleotide sequence SEQ ID NO:1, BCAS3/ATXN7, RPS6 KB1/TMEM49, and RPS6 KB1/EAP30, or fragment(s) thereof. A non-exclusive list of fused genes according to the invention is summarised in Table 1. In particular, one fused gene according to the invention is ARFGEF2/SULF2 fusion gene comprising the nucleic acid sequence of SEQ ID NO: 16 and/or a fragment thereof. Another fused gene according to the invention is RPS6 KB1/TMEM49 fusion gene comprising the nucleic acid sequence of SEQ ID NO: 17 and/or a fragment thereof. Another fused gene according to the invention is ATXN7/a gene having the nucleotide sequence SEQ ID NO:1. This fused gene comprises the nucleic acid sequence of SEQ ID NO: 18 and/or a fragment thereof. Another fused gene according to the invention is ATXN7/BCAS3 fusion gene comprising the nucleic acid sequence of SEQ ID NO: 19 and/or a fragment thereof. The fused may also be MTAP /a gene having the nucleotide sequence SEQ ID NO:2, the gene fusion comprising the nucleic acid sequence of SEQ ID NO: 20 and/or a fragment thereof. Any of the fused gene(s) may be comprised in a vector.
According to another aspect of the current invention, there is provided an isolated nucleic acid comprising the nucleotide sequence SEQ ID NO:1 and/or SEQ ID NO:2, or a fragment thereof. The isolated nucleic acid may be comprised in a vector.
According to yet another aspect of the invention there is also provided a diagnostic and/or prognostic kit for the diagnosis and/or prognosis of tumour in a subject comprising detecting at least one fused gene, wherein the presence of the fused gene is indicative of presence and/or the stage of tumour.
The diagnostic and/or prognostic kit may comprise at least one nucleic acid molecule capable of hybridizing to and/or complementary to the fused gene and/or a fragment thereof, wherein hybridization is indicative of presence and/or the stage of tumour.
The invention further provides a diagnostic and/or prognostic kit for the diagnosis and/or prognosis of tumour in a subject, wherein the kit comprises one or more fragment representative of a genome capable of hybridizing to differentially labelled genomic DNA isolated from tumour tissue from at least one subject and from a control tissue, wherein an increase or decrease of the hybridization intensity and/or signal(s) of the label in the tumour tissue, compared to that in control tissue detects copy number transition (CNT) regions in the tumour tissue, indicative of presence and/or stage of tumour.
The CNT regions detected by the diagnostic and/or prognostic kit may comprise fused gene(s).
The fused gene detected in the diagnosis and/or prognosis of tumour in a subject may be selected from the group of fused genes: RCC2/CENPF, ARFGEF2/SULF2, MTAP/a gene comprising the nucleotide sequence SEQ ID NO:2, ATXN7/a gene comprising the nucleotide sequence SEQ ID NO:1, BCAS3/ATXN7, RPS6 KB1/TMEM49, and RPS6 KB1/EAP30, or fragment(s) thereof.
The fused genes may further be detected by fluorescence in situ hybridization (FISH) and/or rapid amplification of cDNA end polymerase chain reaction (RACE-PCR) technique. The tumour may be stage III tumour. In particular the tumour may be solid tumour. More in particular the tumour may be breast tumour.
According to a further aspect of the invention, there is provided a method of diagnosis and/or prognosis of presence and/or stage of tumour in a subject comprising detecting at least one fused gene, wherein the presence of the fused gene is indicative of presence and/or the stage of tumour.
The method may comprise providing at least one nucleic acid molecule capable of hybridizing to and/or complementary to the fused gene and/or a fragment thereof, wherein hybridization is indicative of presence and/or the stage of tumour.
According to yet another aspect there is also provided a method of diagnosis and/or prognosis of presence and/or stage of tumour in a subject, wherein the method comprises providing one or more fragments representative of a genome capable of hybridizing to differentially labelled genomic DNA isolated from tumour tissue from at least one subject and from a control tissue, wherein an increase or decrease of the hybridization intensity and/or signal(s) of the label in the tumour tissue, compared to that in control tissue detects copy number transition (CNT) regions in the tumour tissue, indicative of presence and/or stage of tumour.
The CNT regions may comprise any fused gene(s) according to the invention. The fused gene detected in the diagnosis and/or prognosis of tumour in a subject may be selected from the group of fused genes: RCC2/CENPF, ARFGEF2/SULF2, MTAP/a gene comprising the nucleotide sequence SEQ ID NO:2, ATXN7/a gene comprising the nucleotide sequence SEQ ID NO:1, BCAS3/ATXN7, RPS6 KB1/TMEM49, and RPS6 KB1/EAP30, or fragment(s) thereof.
The fused genes may further be detected by FISH and/or RACE technique. The tumour may be stage III tumour. In particular, the tumour may be solid tumour. More in particular, the tumour may be breast tumour.
There is further provided a kit for the detecting the presence of fused genes, wherein the kit comprises one or more fragments representative of a genome capable of hybridizing to differentially labelled control and test, genomic DNA wherein an increase or decrease of the hybridization intensity and/or signal(s) of test genome, compared to that in control genome detects copy number transition (CNT) regions in the test genome, wherein the CNT regions comprise fused genes.
According to yet another aspect the invention provides a method of detecting the presence of fused genes, wherein the method comprises providing one or more fragments representative of a genome capable of hybridizing to differentially labelled control and test genomic DNA wherein an increase or decrease of the hybridization intensity and/or signal(s) of test genome, compared to that in control genome detects copy number transition (CNT) regions in the test genome, wherein the CNT regions comprise fused genes.
The fused gene detected in the diagnosis and/or prognosis of tumour in a subject may be selected from the group of fused genes: RCC2/CENPF, ARFGEF2/SULF2, MTAP/a gene comprising the nucleotide sequence SEQ ID NO:2, ATXN7/a gene comprising the nucleotide sequence SEQ ID NO:1, BCAS3/ATXN7, RPS6 KB1/TMEM49, and RPS6 KB1/EAP30, or fragment(s) thereof.
GGCTGAATCCTCAGGGCCGTGGGGGGCTGCATGGCTGATGACCATGAGGACTGGCC
TGTGCGGGTACATCTTCTTGGACGTGCGGAAGAAGCTCACGCTGTCATTGGTGATGA
GGTCTGTGAGGTAATCCTTGGAGTAGTCGGAGCCGTGCTTCTCTTTCACCCCGTTCCG
ACACAGCGTGTAGTTATAAAAGCGGGAGTTTTTAAGGAGTCCGACCCACTCCTTCCAG
CCGGGTGGCACGTAGGAGCCGTTGTATTCATTAAGATACTTCCCGAAGAAAGCTGTCC
GGTAGCCAGTGCTATTGAGGTACACGGCAAAGGTGCGGCTCTCGTGCTGTGCCTGCC
GGGAGGGCGAGGAGCAGTTCTCATTGTTGGTGTAGGTGTTGTGGTTGTGGACGTACT
TGCCGGTGAGGATGGAGGAGCGTGAGGGGCAGCACATGGGTGTGGTCACGAAGGCG
TTGATGAAGTGCGTCCCGCCCTGCTCCATGATGCGCCGGGTCTTGTTCATCACCTGCA
TGGAACCGAGCGCCACCTGGCAGGCCCTGCGCAGCTGGGAGTGCTGGGGCCGCTTC
GGACCAGCCAGAGGACGCGGGCTCTGAGGATGAGCTGGAGGAGGGGGGTCAGTTAA
ATGAAAGCATGGACCATGGGGGAGTTGGACCATATGAACTTGGCATGGAACATTGTGA
GAAATTTGAAATCTCAGAAACTAGTGTGAACAGAGGGCCAGAAAAAATCAGACCAGAA
TGTTTTGAGCTACTTCGGGCTGGGAAAATATTTGCCATGAAGGTGCTTAAAAAGGGAG
GGCTCTGTTCGGGAAGGTTAAGGTACCAAAAATGCAACATCCTGAAATAAGGAGGTGT
GAGGAAAGGCCAGAGGGCGTGGAAGGGGATGAGGATGAAGAGGACACTTGTCTGGA
TTGCATACTGCACACAGGATCCATCGCCCCTGAAGCAGCAGGCTGTGCATTTAGTGTG
TTTCCATGAGCTGGTACCGATTTGCTATTTGGGGAGATGCAGGTAGATGAGAGCAGGA
CTGGGGATGTAGAGACGGTGGCTGCTGCCAGATAGCTGACTCCACATTGTGATGTCG
GCACAGAGTTTGTCCGGTGAGGAATACGTGTGGAGATGGGTGAGGTGGTACTGGGCA
CTGGTGGGATTTTCCAAACTGTGGAGCAGGCAAGATTTTAGCCGCTCGAATTGGGCCA
GGGCTGTACACATGTCATAGTGACCACAGCTTGTGGCTCCTTGAGGGAGGAGATTCA
GCCCGGCGATATTGTCATTATTGATCAGTTCATTGACAGCTATGTCTCACAGTCCAGAC
Bibliographic references mentioned in the present specification are for convenience listed in the form of a list of references and added at the end of the examples. The whole content of such bibliographic references is herein incorporated by reference.
In the invention the authors have identified molecular biomarker for cancer, in particular breast cancer, using entirely a new approach based on high-resolution oligonucleotide based array, the comparative genomic hybridization (a-CGH) (Agilent technologies). CGH is a technique in which differentially labeled tumor (or test) and reference DNA are hybridized to normal human metaphase chromosomes, followed by the analysis of the differences in fluorescence intensities of test and reference DNA along the entire length of chromosomes to identify regions of gains, deletions and amplifications. High-density oligo based a-CGH does not require direct chromosome analysis, construction of genomic or cDNA library. Based on this approach the inventors have isolated and characterized seven novel fusion genes involving 11 genes (Table 1).
The a-CGH technique identified many Copy Number Transition (CNT) regions within known genes and in intergenic regions at a genomic interval from 2.7 kb to 23 kb and 2.7 kb to 4-75 kb respectively. Integrated molecular analysis by cytogenetics and molecular biology methods, including spectral karyotyping (SKY), FISH and RACE-PCR, and cloning approach were used to validate 48 of 83 CNT loci affecting known genes in MCF7. This study is the first of its kind to isolate fusion genes based only on the analysis of unbalanced copy number changes resolved at an unprecedented resolution.
Among the different commercially available oligo based CGH arrays, 244K array (Agilent Technologies) were selected in this study due to its unique array design providing an average resolution of about 6.4 kb and 16.5 kb in gene and intergenic regions respectively. Given the gene centric nature of 244K array all the CNT regions within 2.7 kb to 23 kb in known genes and 4 kb to 75 kb in intergenic regions could be identified (Table 2).
The present invention therefore provides the use of CGH technique for the identification of CNT regions comprising fused genes. All the fusion genes identified in this study were the product of genomic perturbations in genes at copy number transition (CNT) regions, or boundaries of amplifications and deletions, detected in the size range from 30 kb to 1 mb, a resolution not achievable by chromosome based and other CGH methods. Detailed analysis of CNT regions using 244K array revealed the precise identification of rearrangements within known genes. Further characterization of CNT regions by FISH and RACE-PCR approach identified novel fusion transcripts listed in Table 1 above.
Accordingly, the present invention provides an isolated fused gene comprising at least one first gene and/or fragment thereof fused to at least one second gene and/or fragment thereof, wherein at least the first and/or the second gene, independently, is selected from the group of genes consisting of: RCC2, CENPF, ARFGEF2, SULF2, MTAP, ATXN7, BCAS3, RPS6 KB1, TMEM49, EAP30, a gene having the nucleotide sequence SEQ ID NO:1, and a gene having the nucleic acid SEQ ID NO:2, or a fragment thereof. The first and the second gene, independently, may be selected from the group consisting of: RCC2, CENPF, ARFGEF2, SULF2, MTAP, ATXN7, BCAS3, RPS6 KB1, TMEM49, EAP30, a gene having the nucleotide sequence SEQ ID NO:1, and a gene having the nucleic acid SEQ ID NO:2, or a fragment thereof. Accordingly to a particular aspect of the invention, the first gene and the second gene may have inverted position within the fused gene. According to a particular aspect, the first gene may be selected from the group consisting of: RCC2, ARFGEF2, MTAP, ATXN7, BCAS3, and RPS6 KB1, or a fragment thereof. According to a particular aspect, the second gene may be selected from the group consisting of: CENPF, SULF2, a gene having the nucleotide sequence SEQ ID NO:1, a gene having the nucleotide sequence of SEQ ID NO:2, ATXN7, TMEM49, and EAP30, or a fragment thereof. According to one or more embodiment, the first and/or the second gene is ATXN7. According to another embodiment, the first and/or the second gene is ARFGEF2. According to another embodiment, the first and/or the second gene is SULF2. The first and/or second gene may be RPS6 KB1. According to another embodiment, the first and/or second gene is a gene comprising the nucleotide sequence SEQ ID NO:1 or SEQ ID NO:2 or a fragment thereof. The fusion of the genes may be by genomic translocation, insertion, inversion, amplification and/or deletion.
A “fusion gene” as used herein refers to a hybrid gene formed from two previously separate genes and thus resulting in gene rearrangement. Alternatively, the separate genes may undergo rearrangement independently before they fuse to each other. Accordingly “fused gene” may be construed accordingly to refer to any such rearrangement event. Fused genes can occur as the result of mutations such as translocation, deletion, inversion, amplification and/or insertion.
“Translocation” of genes results in a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes. It is detected on cytogenetics or a karyotype of affected cells. “Deletions” in chromosomes may by of the entire gene or only a portion of the gene. Genetic “insertion” is the addition of one or more nucleotide base pairs into a genetic sequence. This can often happen in microsatellite regions due to the DNA polymerase slipping. An “inversion” is rearrangement of genes in a chromosome in which a segment of a gene is reversed end to end. An “amplification” results when a DNA is amplified resulting in the gain in copy number.
The fused gene may be selected from the group of fused genes RCC2/CENPF, ARFGEF2/SULF2, MTAP/a gene comprising the nucleotide sequence SEQ ID NO:2, ATXN7/a gene comprising the nucleotide sequence SEQ ID NO:1, BCAS3/ATXN7, RPS6 KB1/TMEM49, and RPS6 KB1/EAP30, or fragment(s) thereof. In particular, the fused gene may be ARFGEF2/SULF2 fusion gene comprising the nucleic acid sequence of SEQ ID NO: 16 and/or a fragment thereof. More in particular the fused gene may be RPS6 KB1/TMEM49 fusion gene comprising the nucleic acid sequence of SEQ ID NO: 17 and/or a fragment thereof. The fused gene may further be ATXN7/a gene having the nucleotide sequence SEQ ID NO:1 gene fusion comprising the nucleic acid sequence of SEQ ID NO: 18 and/or a fragment thereof. The fused gene may be ATXN7/BCAS3 fusion gene comprising the nucleic acid sequence of SEQ ID NO: 19 and/or a fragment thereof. The fused gene may also be MTAP /a gene having the nucleotide sequence SEQ ID NO:2, the gene fusion comprising the nucleic acid sequence of SEQ ID NO: 20 and/or a fragment thereof.
The fused genes are written together in the form of gene“x”/gene“y”. Therefore the fused genes are referred to in this, form throughout this application.
The fused genes may be in any suitable vector, phage, plasmid, or a fragment comprising the fused gene. There is no limit in the size of the nucleic acid construct and the fused gene.
There is also provided an isolated nucleic acid molecule comprising the nucleotide sequence SEQ ID NO:1 and/or SEQ ID NO:2, or a fragment thereof. The isolated nucleic acid may be comprised in a vector. The vector may be any suitable vector, phage, plasmid, or nucleic acid fragment comprising the nucleic acid molecule of SEQ ID NO: 1 and/or SEQ ID NO: 2. There is no limit in the size of the nucleic acid construct and the nucleic acid molecule.
According to another aspect the invention provides a diagnostic and/or prognostic kit for the diagnosis and/or prognosis of tumour in a subject comprising detecting at least one fused gene, wherein the presence of the fused gene is indicative of presence and/or the stage of tumour.
There is also provided a diagnostic and/or prognostic kit, wherein the kit comprises at least one nucleic acid molecule capable of hybridizing to and/or complementary to the fused gene and/or a fragment thereof, wherein hybridization is indicative of presence and/or the stage of tumour.
The present invention further provides a diagnostic and/or prognostic kit for the diagnosis and/or prognosis of tumour in a subject, wherein the kit comprises one or more fragment representative of a genome capable of hybridizing to differentially labelled genomic DNA isolated from tumour tissue from at least one subject and from a control tissue, wherein an increase or decrease of the hybridization intensity and/or signal(s) of the label in the tumour tissue, compared to that in control tissue detects copy number transition (CNT) regions in the tumour tissue, indicative of presence and/or stage of tumour.
The CNT regions may comprise fused gene(s).
“Diagnose” or “diagnosis” used herein, refers to determining the nature or the identity of a condition (disease). A diagnosis may be accompanied by a determination as to the severity of the disease. “Prognostic” or “prognosis” used herein refers to predicting the outcome or prognosis of a disease, such as to give a chance of survival based on observations and results of clinical tests. “Predisposition” used herein refers to the likelihood of being diagnosed with, or susceptibility to a particular disease.
“Copy number transitions (CNT) regions” refer to boundaries of genomic perturbations due to deletions, insertions, inversions, amplifications described previously in earlier section, that result in the variation the copy number of the genes present therein. The current invention is the first study wherein the fusion genes were isolated based on the analysis of these copy number changes. The invention used the CGH technique to identify CNT regions within known genes. “CGH or Comparative genome hybridization” method used herein analysed copy number changes (gains/losses) in the DNA content. The method is well known to those skilled in the art. CGH is capable of detecting loss, gain and amplification of the copy number at the levels of chromosomes. The use of array CGH overcomes many of these limitations, with improvement in resolution and dynamic range, in addition to direct mapping of aberrations to the genome sequence and improved throughput. The DNA may be isolated from a tumor tissue and from control tissue by standard methods known in the art. The labeling of the DNA is also well known in the art. The fused genes comprised in the CNT regions may be detected by FISH and/or RACE technique. Fused gene may be any one of the fused gene described in the earlier sections.
The term “nucleic acid” is well known in the art and is used to generally refer to a molecule (one or more strands) of DNA, RNA or a derivative or analog thereof comprising nucleobases. A nucleobase includes, for example, a purine or pyrimidine base found in DNA (e.g., an adenine “A”, a guanine “G”, a thymine “T” or a cytosine “C”) or RNA (e.g., an A, a G, an Uracil “U” or a C). The term nucleic acid encompasses the terms “oligonucleotide” and “polynucleotide” each as subgenus of the term “nucleic acid”. The term “complementary” in the context of nucleic acids refers to a strand of nucleic acid non-covalently attached to another strand, wherein the complementarity of the two strands is defined by the complementarity of the bases. For example, the base A on one strand pairs with the base T or U on the other, and the base G on one strand pairs with the base C on the other. An oligonucleotide or analog is of “substantial complementarity” when there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide or analog to non-target sequences under conditions in which specific binding is desired
A nucleic acid molecule is “hybridisable” to another nucleic acid molecule (in the present case, the miR183), when a single-stranded form of the nucleic acid molecule can anneal to the other nucleic acid molecule under the appropriate conditions of temperature and solution ionic strength (Sambrook and Russell, 2001). The conditions of temperature and ionic strength determine the “stringency” of the hybridisation. Hybridisation requires the two nucleic acids to contain complementary sequences. Depending on the stringency of the hybridisation, mismatches between bases are possible. The appropriate stringency for hybridising nucleic acids depends on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of similarity or homology between two nucleotide sequences, the greater the value of Tm for hybrids of nucleic acids having those sequences. The relative stability (corresponding to higher Tm) of nucleic acid hybridisation decreases in the following order: RNA:RNA, DNA:RNA, DNA:DNA. For hybrids of greater than 100 nucleotides in length, equations for calculating Tm have been derived (Sambrook and Russell, 2001). For hybridisation with shorter nucleic acids, i.e. oligonucleotides, the position of mismatches becomes more important, and the length of the oligonucleotide determines its specificity (Sambrook and Russell, 2001).
The DNA may be isolated from a tumour tissue. The tumour is stage III tumour, wherein the tumour is solid tumour. In particular the tumour may be breast tumour. The tumour tissue may be from a subject suffering from the tumour.
A “subject” may be a patient suffering from the tumour, in particular solid tumour, for example, breast tumour. A person skilled in the art will know how to select subjects based on their amenability to a particular treatment, or their susceptibility to a particular disease.
The “control” for example, may not be suffering from tumour. The control may exhibit control level label intensity and/or signal from the labelled DNA. The “control value” may also be an average value in expression obtained from a selected population.
The stage of a tumour is a descriptor (usually numbers I to IV) of how much the cancer has spread. The stage often takes into account the size of a tumor, how deep it has penetrated, whether it has invaded adjacent organs, if and how many lymph nodes it has metastasized to, and whether it has spread to distant organs. Staging of cancer is important because the stage at diagnosis is the most powerful predictor of survival, and treatments are often changed based on the stage. Correct staging is critical because treatment is directly related to disease stage. Thus, incorrect staging would lead to improper treatment, and material diminution of patient survivability. Correct staging, however, can be difficult to achieve. Staging systems are specific for each type of cancer (e.g. breast cancer).
Overall Stage Grouping is also referred to as Roman Numeral Staging. This system uses numerals I, II, III, and IV (plus the 0) to describe the progression of cancer. Stage 0 cancers are carcinoma in situ. Stage I cancers are localized to one part of the body. Stage II cancers are locally advanced, as are Stage III cancers. Whether a cancer is designated as Stage II or Stage III can depend on the specific type of cancer; for example, in Hodgkin's disease, Stage H indicates affected lymph nodes on only one side of the diaphragm, whereas Stage III indicates affected lymph nodes above and below the diaphragm. The specific criteria for Stages II and III therefore differ according to diagnosis. Stage IV cancers have often metastasized or spread to other organs or throughout the body.
According to yet another aspect, the invention provides a method of diagnosis and/or prognosis of presence and/or stage of tumour in a subject comprising detecting at least one fused gene, wherein the presence of the fused gene is indicative of presence and/or the stage of tumour.
The method may comprise providing at least one nucleic acid molecule capable of hybridizing to and/or complementary to the fused gene and/or a fragment thereof, wherein hybridization is indicative of presence and/or the stage of tumour.
According to a further aspect there is provided a method of diagnosis and/or prognosis of presence and/or stage of tumour in a subject, wherein the method comprises providing one or more fragments representative of a genome capable of hybridizing to differentially labelled genomic DNA isolated from tumour tissue from at least one subject and from a control tissue, wherein an increase or decrease of the hybridization intensity and/or signal(s) of the label in the tumour tissue, compared to that in control tissue detects copy number transition (CNT) regions in the tumour tissue, indicative of presence and/or stage of tumour. In particular, the CNT regions comprise fused gene(s). The fused genes may be detected by FISH and/or RACE technique.
The method of diagnosis and/or prognosis may be for stage III tumour, in particular solid tumours. In particular, the tumour may be breast tumour.
There is further provided a kit for the detecting the presence of fused genes, wherein the kit comprises one or more fragments representative of a genome capable of hybridizing to differentially labelled control and test genomic DNA wherein an increase or decrease of the hybridization intensity and/or signal(s) of test genome, compared to that in control genome detects copy number transition (CNT) regions in the test genome, wherein the CNT regions comprise fused genes.
According to yet another aspect, the invention provides a method of detecting the presence of fused genes, wherein the method comprises providing one or more fragments representative of a genome capable of hybridizing to differentially labelled control and test genomic DNA wherein an increase or decrease of the hybridization intensity and/or signal(s) of test genome, compared to that in control genome detects copy number transition (CNT) regions in the test genome, wherein the CNT regions comprise fused genes.
The “test genomic DNA” as used herein refers to the labelled genomic DNA to be compared with a control DNA. The test genomic DNA is understood to have the same meaning as DNA isolated from a tumour tissue of a subject.
Having now generally described the invention, the same will be more readily understood through reference to the following examples which are provided by way of illustration, and are not intended to be limiting of the present invention.
Standard molecular biology techniques known in the art and not specifically described were generally followed as described in Sambrook and Russel, Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (2001).
Array comparative Genomic Hybridization (a-CGH)
Oligo nucleotide based array comparative genomic hybridization is an emerging technology designed for high precision mapping of unbalanced copy number changes (Barrett et al., 2004). Poor resolution limits in metaphase chromosome based CGH, cDNA array CGH and BAC clone array CGH detected copy number change boundaries within a large genomic distance of more than 100 kb to several megabases. The SNP array with high density probes from Affymetrix can be used for copy number analysis, but the probes are mostly selected from intergenic regions and further validation studies are required to map breakpoints within genes. In this study the recently introduced version (244K array) of the oligo CGH array from Agilent Technologies, USA, which contains 244,000 probes providing a genome wide average resolution of ˜6.4 kb to 16.5 kb and even higher resolution within in genes (<3-10 kb) was used. Array features include mainly probes from the well known and cancer related genes and a minimal number of probes are derived from intergenic regions. Given the unique design and reproducibility of this method high precision mapping of genomic rearrangements and copy number changes are obtained with remarkable specificity. Although this method is developed and available through commercial sources, it allows us to custom design the array by selecting probes at even higher density for a genomic region of interest which allow us to design our own array to achieve resolution in the range of less than 1 kb for a given region.
Oligonucleotide comparative genomic hybridization is a high-resolution method to detect unbalanced copy number changes at whole genome level. Competitive hybridization of differentially labelled tumor and reference DNA to oligonucleotide printed in an array format (Agilent Technologies, USA) and analysis of fluorescent intensity for each probe will detect the copy number changes in the tumor sample relative to normal reference genome (
Based on the best resolution detected in 244 K array MCF7 cell lines known to contain many unbalanced structural and numerical aberrations were analyzed (
Strategy to Isolate Fusion Gene from a CNT Region (
Select CNT region within a gene
Confirm genomic rearrangement by fluorescence in situ hybridization
Identify genomic interval of CNT region
Design primer from the region present in at least one copy
Avoid regions that are involved in homozygous deletion
Design primers from exons close to the CNT region
Decide on 5′ or 3′ RACE depending on the orientation of the gene
Clone PCR product and sequence
Confirm RACE PCR results by RT PCR using a primer from the known and the new gene
Using the strategy described above, the present inventors validated 48 genes containing CNT regions in MCF7 cell line and isolated seven novel fusion genes described in the following sections.
Gene 1: RCC2/CENPF (SEQ ID NO: 15) rearranged at 1(q41)
CNT region in CENPF gene with the genomic interval of 10,827 bp between 5′211190840 and 3′211201667 containing exons 9, 10 and 11 was identified. The 5′ end of the gene is present in at least one copy and 3′ region amplified to at least three copies. FISH analysis using BAC clones (RP11-281J12, 3′end and RP11-37015, 5′end) confirmed rearrangement of CENPF with at least three locations rather than tandem duplication on the same chromosome. Spectral karyotyping analysis revealed one copy of normal chromosome 1 and a second copy rearranged with chromosome X, in addition small segments of chromosome 1 inserted in at least five different locations (
The PCR product was cloned into a plasmid vector using TA cloning kit (Invitrogen, USA) and sequence analysis showed the breakpoint in exon 9 and a 46 bp upstream sequence matching the 5′ end of RCC2 gene. Surprisingly, the 46 bp RCC2 sequence matched only to the mRNA sequence in the GENBANK by BLAST search, but not to the genomic sequence of RCC2. FISH validation for confirmation of fusion of RCC2 with CENPF was negative. Further analysis of sequence starting from the breakpoint in exon 9 of CENPF and the rest of the 3′ end sequence confirmed a perfect open reading frame (ORF) starting from the breakpoint immediately upstream of ATG sequence in exon 9. Although the 3′RACE PCR was negative in both RNA's we performed RT PCR using primers from exon 7 and 11 of CENPF and confirmed the absence of normal transcript which indicated the expression of only truncated form of CENPF. Further validation by RT PCR using RNA from cell lines and primary breast cancer tumors showed amplification in cell lines T47D (72 hours after E2 treatment), and MDAMB 436 under normal condition (
These results provide evidence for the isolation of a rearranged gene from a CNT region without any direct evidence from conventional karyotyping. Further the results show that the expression of CENPF is regulated by E2 and the CENPF is expressed in a truncated form in majority of breast cancer tumors. These results also indicate the role of CENPF in centromere kinetocore assembly during cell division. Importantly the invention suggests that a high level expression of truncated CENPF is seen in grade 3 primary breast cancer tumors and the aberrant CENPF protein may be causative factor for abnormal segregation of chromosomes during mitosis leading to aneuploidy.
Isolation of Fusion Genes from the Commonly Amplified Regions in Breast Cancer: Characterization of Amplifications in Breast Cancer
The randomness of most of the chromosome rearrangements between different breast cancer tumors might not yield a specific recurrent chromosome aberration, however, it has been shown that 17q23 and 20q13 regions are recurrently amplified in 20-39% of primary breast cancer with distinct clinical outcome. An in depth characterization of these two amplicons revealed many CNT regions affect genes known to be over expressed in breast cancer but none of them were identified as fusion genes except BCAS4 and BCAS3 (Barlund et al., 2002). Three novel fusion genes were isolated using the CNT in the amplicons using the present inventors' new approach. In MCF7, throughout the genome there were many amplified regions from 3 copies to more than 40 copies, particularly at 17q23 and 20q13. The 17q23 amplification reported in 20% of primary breast tumors and many genes including RPS6 KB1, MUL, APPBP2, and TRAP240 are known to be over expressed. Similarly, genes AIB1, ZNF217, BTAK, and NABC1 in 20q13 amplification reported to be over expressed in 12-39% of primary breast tumor (Kallioniemi et al., 1994, Muleris, et al., 1994). High-level amplification of 20q13 may be an indicator of poor clinical outcome in node-negative breast cancer. The 17q23 amplicon revealed genes that may have oncogenic potential and may contribute to the more aggressive clinical course in breast cancer patients. All the genes in this amplicon showed variable level of expression and further variations in expression found in different probes for PRKCBP1 gene, indicating additional rearrangements within amplicons without showing an obvious CNT. Contrary to the conventional interpretation, these results indicate that amplicons are the rich source of rearrangements and the chance for identifying novel fusion genes are much higher in amplified regions. Further detailed analysis for all the genes within amplicons are described in detail in the in the following sections.
The present inventors further attempted to understand the genomic organization of the amplified regions in MCF7 for which we performed FISH analysis using a BAC clone for BRIP1 (RP11-482H10) gene within the amplified region at 17q23. FISH results indicated that the amplified sequences are inserted at many locations within the genome (
Gene 2: ARFGEF2/SULF2 (SEQ ID NO: 16) inv(20q13.13)
Isolation of a Fusion Gene Produced by Inversion within an Amplicon
Among the 83 CNT region identified within genes, genes from the commonly amplified region in breast cancer were selected. Amplification at 20q13 reported in 20-39% of primary breast cancer is known to be associated with aggressive clinical behaviour. A non-contiguous amplification of a 10 mb region at 20q13 identified nine CNT regions affecting EYA2, ARFGEF2, SLC9A8, BCAS4, ZNF217 and DOK5 genes and three in intergenic regions (
Further to the confirmation of ARFGEF2/SULF2 fusion gene in MCF7, the present inventors extended our analysis to estimate the incidence in primary breast cancer tumors and breast cancer cell lines. RT PCR analysis using the following primers from ARFGEF2 exon 1 (5′ TAGCCGACAAGGTGAAG 3′) (SEQ ID NO: 6) and reverse primer from exon 6 of SULF2 gene (5′ GTGTAGCGCATGATCCAGTG 3′) (SEQ ID NO: 7) showed the presence of fusion gene in 17/35 (49%) of primary tumors (
Gene 3: RPS6 KB1/TMEM49 (SEQ ID NO: 17) ins(17)(q23.2)
Isolation of Promiscuous Fusion Gene Produced by Insertion and Inversion within an Amplicon.
With the successful cloning of a fusion gene from 20q13 amplicon, the present inventors extended our analysis to the non contiguous amplification of about 3.3 mb at 17q23 containing seven CNT regions affecting TEX14, FAM33A, DHX40, TMEM49, INTS2 genes and BCAS3 gene with two CNT regions (
The kinase domain of RPS6 KB1 gene is partially preserved in the fusion gene and no coding sequences from TMEM49 is involved in the fusion transcript. Due the close proximity of the presence of mir-21, this translocation may be targeted to the over expression of mir 21. Activation of mir-21 by a protein kinase is a new avenue for future research, as it has been known that majority of the microRNA genes are located in chromosomal breakpoints frequently rearranged in cancer. It is also important to note that microRNA (mir-21) is located 245 bp telomeric to the last untranslated exon of TMEM49 gene and 51745 by upstream from the first exon of RPS6 KB1. Mir-21 is reported to be over expressed in breast cancer and glioblastoma.
Since the fusion gene contains only the last untranslated exon of TMEM49, this study indicates that, in addition to the formation of RPS6 KB1/TMEM49 fusion gene, this translocation is targeted to the over expression of mir-21.
Gene 4: RPS6 Kb1/EAP30 inv(17)(q23.2-q21.32)
As discussed in the previous section, the distribution of amplified sequences of RPS6 Kb1 to many locations in MCF7 genome suggested a possibility of promiscuous rearrangement within in the amplified sequences. 3′RACE PCR from the first exon of RPS6 KB1 revealed the presence of normal RPS6 KB1 transcript in all the cell lines and primary breast tumours. In BT474 cell line a second band of about 900 bp showed (
Isolation of Two Fusion Genes from Two CNT Regions within a Gene
Among the 83 genes identified to contain CNT regions, BCAS3 and ATXN7 genes showed two CNT regions formed by high level amplification of small regions at the 3′ and 5′ ends and a segment in between amplified at a low level (
Genes 5 and 6: ATXN7/Novel gene of SEQ ID NO:1 (SEQ ID NO: 18) t(1; 3)(p21.1; 14.1) and BCAS3/ATXN7 (SEQ ID NO: 19) t(3; 17)(q23.2; p21.1). ATXN7 gene is located on chromosome 3 at genomic interval from 63,825,273 bp to 63,961,367 bp. In MCF7, an amplification of 3.35 mb starting from 5′61579369 to 649377253′ include ATXN7 in which a small region of 53,771 by region starting from 5′63901813 to 639555843′ is not amplified at the same level as the rest of the 5′ and 3′ end of ATXN7 gene resulting in the formation of two distinct CNT regions leaving exons 1-4 at the 5 end and exons 11 and 12 at the 3′end. FISH analysis using BAC clone RP11-1143K18 showed insertion of ATXN7 sequences at multiple locations in the genome (
Novel Fusion Gene Isolated from a CNT Region in the Commonly Deleted Region in Multiple Cancer Types
Gene 7: MTAP/Novel gene of SEQ ID NO: 2 (SEQ ID NO: 20) (del (9)(p21)
Large genomic deletions are common in a variety of cancer types. Deletions at 9p21 has been reported in variety of cancer types including gliomas, mesothelioma, childhood, ALL, lung cancer and leukemia confirmed by FISH and other molecular methods. The extent of the deleted region is quite variable in different samples however a recurrent deletion boundary spanning intron 4 was reported (Batova et al., 1996). Although the genes located within the deletion are considered to be lost depending on the extent of the deletion, but it is intriguing to note that the boundaries of deletion might fall within known genes forming a distinct CNT region. The present inventors observed a CNT within MTAP gene in region of 254 kb deletion including part of MTAP gene starting in intron 4 and CDKN2A and CDKN2B genes leaving the first 4 exons of MTAP genes intact with at least one copy. We applied our nested RACE PCR strategy using primers from exon 4 (5′ATCATGCCTTCAAAGGTCAACTA3′) (SEQ ID NO: 14) and performed 3′RACE and found a 728 by PCR product of a fusion gene containing the first four exons of MTAP gene and an EST sequence from the immediately flanking region of the deletion at the 5′ end of the deletion suggesting the formation of an frame fusion following the deletion event. Gene expression data for all the probes included for genes within the deleted region including MTAP gene showed no expression due to the fact that all the isolation of a novel fusion gene (SEQ ID NO: 2) from a region commonly deleted in a variety of cancer types.
Analysis of array CGH data from MCF7 cell line showed more than 100 regions of copy number gains and losses, ranging in the size from 30 kb to 30 MB. These include regions with low level copy number gains, losses and high level amplifications (3 to >40 copies). In addition to the identification of regions of gains and losses, careful analysis at the copy number transition boundaries revealed 124 breakpoints within known and cancer related genes. Of the 124 breakpoints, 33% of breakpoints occurred at the intergenic regions and 67% identified within genes at either 3′ or 5′ end providing a direct clue to map the breakpoint in a gene within a small genomic distance. Further, it underscores the importance of the concentration of breakpoints within genes rather than random breaks within intergenic regions. This indicates that most, if not all, the rearrangements are targeted to affect the function of genes either by dysregulation or formation of fusion genes. Therefore, this study is a conceptual jump in understanding, the unbalanced copy number changes in solid tumor genome by providing a methodological approach to discover novel fusion genes.
This invention allows identifying novel fusion genes by analyzing unbalanced copy number changes in various cancer types using array CGH technology since existing technologies for genome characterization suffer from its own limitations, for example, BAC, cDNA and low density tiling arrays do not provide sufficient resolution to identify copy number transition with in a short genomic interval. Other methods including End sequence profiling (ESP), representation oligonucleotide microarray (ROMA) detects rearrangements at large genomic interval (>100 kb). The array designs used in this study identified start and stop position of breakpoint intervals at a resolution as low as 2.7 kb to maximum of 23 kb (Table 1).
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SG2007/000361 | 10/22/2007 | WO | 00 | 4/21/2010 |