Cancer genomes often contain fusion genes, created after structural chromosomal rearrangements such as translocations, deletions, and inversion. Fusion genes are typically found in haematological cancers. So far, fusion genes have been found only rarely associated with solid tumours, in contrast to detection of numerous genomic copy number imbalances. However, recent reports have shown that fusion transcripts may prove to be a common contributor also to the development of solid tumours (Mitelman et al., 2007, Teixeira, 2006, Tomlins et al., 2005). The main problem has been the technological limitations for detection of fusion genes in solid tumours.
Identification of certain fusion genes are currently performed for differential diagnosis or therapeutic decision-making in haematological cancers and some rare solid tumour types. At present, routine diagnostics laboratories use laborious and inefficient analyses for detection of fusion genes in clinical samples. The tests are typically cytogenetic chromosome analyses (karyotyping—usually by Giemsa banding) and/or RT-PCR of a selection of the most common fusion genes covering the most common break points for the individual novel transcript. To obtain metaphase chromosomes for karyotyping, a considerable amount of fresh tissue material is required, which also need to contain living and dividing cells. This methodology is also time consuming and labour intensive, and yet only has a success rate of about 70 percent. Furthermore, it is necessary to have highly experienced and competent personnel to examine the chromosomes visually, providing subjective results that also are at low-resolution. RT-PCR is a focused method, enabling analysis of one or a few candidate fusion genes at the time, at pre-defined fusion break points within them. The major limitation of this method is that it is not genome-wide, and thus a negative finding is not conclusive.
There have been a few reports trying to identify predetermined fusion genes by oligo microarrays targeting specific junction sequences. These relied on a preceding step with amplification of the probes by RT-PCR, specifically targeting a small selection of predefined fusion genes and individual junction sequences therein. Similarly, junction oligos between exons in the same gene have been used for detection of alternative splicing.
Nasedkina et al., 2002, used multiplex RT-PCR followed by microarrays for identification of PCR products containing specific fusion transcripts. Their microarray contained probes for detection of up to two fusion variants of each of four well-known fusion genes. PCR amplification was performed as a nested two-round multiplex reaction with specific primers. Thus, their method and microarrays was designed for identification of only a few predetermined gene fusions.
Nasedkina et al., 2003 expanded on the above findings to include probes targeting one additional fusion gene, and 247 cases of childhood leukaemia were screened. Again, the authors only aimed at identification of predetermined fusion genes, more specifically fusion genes of clinical relevance for childhood leukaemia.
Shi et al., 2003 used multiplex RT-PCR for amplification of seven fusion genes and subsequently used oligo microarrays to identify the PCR product, i.e. oligos targeting one or two sites per fusion gene. As with Nasedkina et al., 2002, Nasedkina et al., 2003, their analysis was limited to a rather small number of predetermined fusion genes that are known to have an association with leukaemia. The authors claim that their method is quantitative, as opposed to the method of Nasedkina et al., 2002, Nasedkina et al., 2003. Further, Shi et al., 2003 mention on page 1069 that “Although multiplex RT-PCR with 10-20 primer pairs was ideal, our preliminary data indicated that multiplex RT-PCR with primer pairs in excess of 20 was achievable with substantial assay optimization effort. However, the probability that formation of non-specific PCR products and primer-dimers would increase with increasing numbers of primers limited the maximum number of primer pairs”. Thus, they acknowledge an unmet demand for higher throughput of the analysis and suggest that more than one multiplex RT-PCR can be devised to encompass more than 40 fusion transcripts. Further, the authors on page 1072 mention that “Because some of the translocation fusion splice junction sites may be a few kilobases distant from the 3′ poly(A) tail on the mRNA, use of microarray assay alone is not possible at this stage because the reverse transcriptase is unable to generate cDNA long enough to reach the fusion splice-junction site”. In other words, sequence specific RT-PCR is necessary for the assay to function, which in turn limits the throughput of the method for the reasons mentioned above.
Use of oligo microarrays in the analysis of pre-mRNA splicing patterns have previously been described in for example Bingham et al., 2006, Johnson et al., 2003.
US 2006/0084105 describes a microarray comprising sets of probes for detection of gene products that are produced by pre-mRNA splicing of a selected gene. The array comprises 372 splice junctions within 64 genes.
US 2006/012952 and WO 03/014295 also relate to the use of microarrays for detection of pre-mRNA splice variants.
In a first aspect, the invention provides a microarray comprising a chimeric probe for an exon-to-exon junction of a fusion gene.
A second aspect of the invention is a method for detection of fusion genes and a third aspect of the invention is a kit comprising the microarray of the invention.
A first aspect of the invention is a microarray comprising a chimeric probe for an exon-to-exon junction of a fusion gene.
The microarray of the present invention may in particular further comprise at least two intragenic probes for a fusion gene partner of the fusion gene.
An advantage of including intragenic probes is that the likelihood of false-positive results is reduced. The intragenic probes provide exon level data on the gene expression, thus enabling comparisons of expression levels up- and downstream of suspected breakpoints of potential fusion gene partners. At the point where the expression level of the exons shift as illustrated in
Another advantage of using the intragenic probes is that they may be used to indicate previously unidentified fusion genes.
The intragenic probes may in particular correspond to intra-exon sequences, exon-to-exon junctions, exon-intron junctions and intron-exon junctions of a fusion gene partner of the fusion gene. Such intragenic probes may be used to determine the expression level of fusion genes and/or fusion gene partners. In a preferred embodiment, intragenic probes are used in varying amounts or lengths in separate spots to facilitate quantification and comparison.
In a particular embodiment, the at least two intragenic probes are capable of targeting each side of the fusion break point; i.e. the intragenic point where one fusion gene partner is fused to another fusion gene partner
The microarray of the present invention may in particular comprise at least 2 intragenic probes, such as at least 3 intragenic probes, or at least 4 intragenic probes, or at least 5 intragenic probes, or at least 6 intragenic probes, or at least 7 intragenic probes, or at least 8 intragenic probes, or at least 9 intragenic probes, or at least 10 intragenic probes, or at least 20 intragenic probes, or at least 30 intragenic probes, or at least 40 intragenic probes, or at least 50 intragenic probes, or at least 75 intragenic probes, or at least 100 intragenic probes, or at least 500 intragenic probes, or at least 1000 intragenic probes.
In particular the microarray of the present invention comprises at least two intragenic probes for each of the fusion gene partners of a fusion gene. If the microarray of the present invention is able to detect more than one fusion gene said microarray may comprise a different number of intragenic probes for each of the fusion genes. For example said microarray may comprise at least two intragenic probes for both fusion gene partners of one fusion gene and at least two intragenic probes for only one fusion gene partner of another fusion gene.
In particular the microarray of the present invention comprises a chimeric probe and at least two intragenic probes which target the same fusion gene. In particular the microarray of the present invention may comprise at least two intragenic probes for each of the included fusion genes. More particularly the microarray of the present invention may comprise at least two intragenic probes for each of the included fusion gene partners. In this context the term “included” refers to the fusion gene or fusion gene partner that said microarray is intended to be capable of detecting by comprising chimeric probes for.
In one embodiment of the present invention the microarray of the present invention comprises intragenic probes for each of the included fusion gene partners. In particular the microarray of the present invention may include three intragenic probes per exon, and said intragenic probes may in particular be targeting exon-to-exon junctions.
Preferably, the microarray comprises intragenic probes corresponding to all exons, exon-to-exon junctions, exon-intron junctions and intron-exon junctions of the individual fusion gene partners of the microarray.
Even more preferably, the microarray comprises 2, 3, 4, or 5 intragenic probes corresponding to each exon of the individual fusion gene partners of the microarray.
An intragenic probe as used herein is a nucleic acid or a nucleic acid analogue, capable of sequence-specific base pairing. The intragenic probe may consist of or comprise natural nucleotides or non-natural nucleotides such as LNA monomers (locked nucleic acid monomers), INA monomers (intercalating nucleic acid monomers), or PNA monomers (peptide nucleic acid monomers).
Preferably, the microarray of the invention comprises intragenic probes targeting fusion gene partners of more than one fusion gene. For example the microarray of the present invention may comprise intragenic probes for at least 2 fusion genes, such as at least 5 fusion genes or at least 10 fusion genes, or at least 20 fusion genes, or at least 30 fusion genes, or at least 50 fusion genes, or at least 75 fusion genes, or at least 100 fusion genes, or at least 250 fusion genes or at least 500 fusion genes, or at least 1000 fusion genes. Thus, in a preferred embodiment, the microarray of the invention comprises intragenic probes for a number of the fusion genes listed in Table 1, selected from the group consisting of at least 5 fusion genes, at least 10 fusion genes, at least 20 fusion genes, at least 30 fusion genes, at least 40 fusion genes, at least 50 fusion genes, at least 75 fusion genes, at least 100 fusion genes, at least 150 fusion genes, at least 200 fusion genes, at least 250 fusion genes, at least 275 fusion genes and at least 316 fusion genes.
The intragenic probes may be either antisense probes oriented to hybridise to mRNA or double-stranded cDNA, or sense probes being oriented to hybridise to cDNA of the fusion genes. Thus, the term “corresponds” as used in this context refers to either the same sequence or the complementary sequence.
The microarray may comprise both antisense and sense intragenic probes, i.e. it may be useful for hybridisation with both cDNA and mRNA or both strands of a PCR product.
The intragenic probes may be probes capable of hybridising to an exon sequence or they may be capable of hybridising to an intragenic junction sequences; e.g. exon-to-exon junctions, exon-intron junctions or intron-exon junction. If the intragenic probe is for a intragenic junction sequence it may preferably be isothermic, i.e. the intragenic junction sequence probe for each side of the junction may be adjusted in length to have a melting temperature (Tm value) that differs by at most 20 degrees Celsius when hybridised to a complementary DNA sequence under the conditions employed for hybridisation of the microarray. In other embodiments, the Tm values differ by at most 40 degrees Celsius 35 degrees, Celsius 30 degrees Celsius, 25 degrees Celsius, 15 degrees Celsius, and 10 degrees Celsius, respectively. Isothermic probes are favourable to enable good hybridisation conditions across the complete set of probes (oligonucleotides) on the microarray.
Moreover, the first part and the second part of such intragenic junction sequence probes are preferably adjusted in length to have a Tm value that differs at most 10 degree Celsius under the conditions employed for hybridisation of the microarray. In other embodiments, the Tm values differ by at most 16 degrees Celsius, 14 degrees Celsius, 12 degrees Celsius, 8 degrees Celsius, 6 degrees and 4 degrees Celsius.
Adjustment of the Tm value of a probe or part of a probe may be achieved as described below in relation to the chimeric exon-to-exon probes.
The Tm value of the intragenic probes may preferably be selected from the group consisting of more than 45 degrees Celsius, more than 50 degrees Celsius, more than 55 degrees Celsius, more than 60 degrees Celsius, more than 65 degrees Celsius, more than 70 degrees Celsius and more than 75 degrees Celsius.
The length of the intragenic probes are preferably selected from the group consisting of less than 60 nucleotides, less than 55 nucleotides, less than 50 nucleotides, less than 45 nucleotides, less than 40 nucleotides and less than 35 nucleotides.
The microarray of the present invention may in particular be for detection of a fusion gene.
The fusion gene may be any fusion gene. Preferably, at least one of the fusion gene partners has previously been implicated as part of a verified fusion gene. More preferably, the fusion gene is selected from the group consisting of the following known fusion genes,
wherein Gene A is the upstream fusion gene partner of the fusion gene and Gene B is the downstream fusion gene partner of the fusion gene.
A chimeric probe as used herein is a nucleic acid or a nucleic acid analogue, capable of sequence-specific base pairing, which comprises a first sequence corresponding to an exon of a first gene and a second sequence corresponding to an exon of a second gene. Importantly, the first gene is different from the second gene, i.e. the probe covers an intergenic exon-to-exon junction. The term exon-to-exon junction, as used in the present context, refers to an intergenic exon-to-exon junction. The chimeric probe may consist of or comprise non-natural nucleotides such as LNA monomers (locked nucleic acid monomers), INA monomers (intercalating nucleic acid monomers), or PNA monomers (peptide nucleic acid monomers).
The term fusion gene as used herein refers to the result of a genomic aberration, such as a chromosomal translocation, deletion, or inversion, bringing sequences from two different genes together. That is, the fusion gene comprises at least one exon of an upstream gene partner of the fusion gene and at least one exon of a downstream gene partner of the fusion gene.
Herein, the term fusion gene also refers to a hypothetical fusion gene that has not been experimentally verified.
For example Hahn et al, 2004 describes a bioinformatics strategy for identification of such potential fusion genes. It is envisaged that the fusion gene which is detected by the present invention may be a candidate fusion gene identified by use of the method described in Hahn et al, 2004 or other methods capable of identifying potential fusion genes.
A fusion gene partner as used herein refers to a gene that donates at least one exon to a fusion gene. The exon(s) of an upstream fusion gene partner are placed upstream of the exon(s) of the other fusion gene partner in the fusion gene transcript, and vice versa.
Of particular interest for the present invention are fusion gene partners and fusion genes that have previously been implicated in cancer. Table 1 lists preferred fusion genes with Gene A being the upstream fusion gene partner of the fusion gene and Gene B being the downstream fusion gene partner of the fusion gene.
The vast majority of fusion gene partners are fused within intron regions to create the fusion gene (Novo et al., 2007), and splicing of the pre-mRNA fusion transcript will connect exons creating an intergenic exon-to-exon junction in the fusion transcript.
Hypothetical intergenic exon-to-exon junctions can be predicted when the exon-intron structures of two fusion gene partners of a hypothetical fusion gene are known. Exons of the potential fusion gene partners can be retrieved from various internet-based genome databases, such as www.biomart.orq.
In a preferred embodiment, the microarray of the invention comprises a chimeric probe for at least 20% of all possible exon-to-exon junctions of a fusion gene.
In another preferred embodiment, the microarray of the invention comprises a chimeric probe for at least 30% of all possible exon-to-exon junctions, such as at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
In yet another embodiment, the microarray of the invention comprises chimeric probes for at least 20 exon-to-exon junctions of the same or different fusion genes.
In still another preferred embodiment, the microarray comprises chimeric probes for at least 30 exon-to-exon junctions, at least 40 exon-to-exon junctions, at least 50 exon-to-exon junctions, at least 60 exon-to-exon junctions, at least 70 exon-to-exon junctions, at least 80 exon-to-exon junctions, such as at least 100 exon-to-exon junctions of the same or different fusion genes.
The present inventors have recognized that it may not be sufficient to test for previously characterized (experimentally verified) fusion genes with a pre-determined exon-to-exon junction and that it is desirable to test all possible exon-to-exon junctions of a particular fusion gene. Very often, the exact location of the exon-to-exon junction is not the decisive factor in determining whether a fusion gene is oncogenic or otherwise involved in or predictive of cancer or other conditions.
For example, for the TMPRSS2-ERG fusion gene, newly identified in prostate cancer (Tomlins et al., 2005), fusion transcripts have already been determined with junctions after exons 1, 2, 3, 4, and 5 in TMPRSS2, and before exons 2, 3, 4, 5, and 6 in ERG, at many different combinations (Clark et al., 2006). Thus, choosing the one or few junctions that are most prevalent, would give a considerable probability of false negative results. This particular fusion gene is also an example of a fusion gene being created by deletion of a relatively small chromosomal fragment (3 Mbp), subsequently joining the two fusion gene partners. This small aberration is invisible by cytogenetic analyses due to the resolution level.
Oncogenicity may simply lie in overexpression of the downstream part of the fusion gene. Therefore, one advantage of the present invention is that it does not rely on a single or few pre-determined exon-to-exon junctions, but it is capable of detecting all possible exon-to-exon junctions of a given fusion gene.
Another advantage is that the invention does not require fresh cells as do e.g. karyotyping, described in the background section. Moreover, interpreting the results of the microarray analysis is more straightforward than interpreting the result of karyotyping, which takes highly trained personnel. In principle, the set of intergenic exon-to-exon junction probes on the microarray will only produce a significant signal at a spot corresponding to an exon-to-exon junctions present in a fusion gene transcript.
Further, in contrast to a cytogenetic approach, there is no risk for selection among cells with the current invention, because RNA from all the cells of the biological sample is included into the measurements.
In a preferred embodiment, the microarray of the invention comprises a chimeric probe for each possible exon-to-exon junction of the fusion gene.
Preferably, the microarray of the invention comprises chimeric probes for more than one fusion gene. For example the microarray of the present invention may comprise chimeric probes for at least 2 fusion genes, such as at least 5 fusion genes or at least 10 fusion genes, or at least 20 fusion genes, or at least 30 fusion genes, or at least 50 fusion genes, or at least 75 fusion genes, or at least 100 fusion genes, or at least 250 fusion genes or at least 500 fusion genes, or at least 1000 fusion genes. Thus, in a preferred embodiment, the microarray of the invention comprises chimeric probes for a number of fusion genes listed in Table 1, selected from the group consisting of at least 5 fusion genes, at least 10 fusion genes, at least 20 fusion genes, at least 30 fusion genes, at least 40 fusion genes, at least 50 fusion genes, at least 75 fusion genes, at least 100 fusion genes, at least 150 fusion genes, at least 200 fusion genes, at least 250 fusion genes, at least 275 fusion genes and at least 316 fusion genes.
In an even more preferred embodiment, the microarray of the invention comprises chimeric probes for each possible intergenic exon-to-exon junction for a number of fusion genes listed in Table 1, selected from the group consisting of at least 5 fusion genes, at least 10 fusion genes, at least 20 fusion genes, at least 30 fusion genes, at least 40 fusion genes, at least 50 fusion genes, at least 75 fusion genes, at least 100 fusion genes, at least 150 fusion genes, at least 200 fusion genes, at least 250 fusion genes, at least 275 fusion genes and at least 316 fusion genes.
Most preferably, the microarray of the invention comprises chimeric probes for each possible intergenic exon-to-exon junction for all fusion genes listed in Table 1. Even more preferably, the microarray of the present invention comprises a chimeric probe and at least two intragenic probes for all fusion genes listed in Table 1. Such a microarray is useful for identification of fusion genes in any sample and requires no prior knowledge of pre-dispositions to particular fusion genes based on e.g. cancer type or patient history.
The sequence of the chimeric probes of the microarray comprise a first part and a second part, wherein the first part corresponds to the 3′ end of an exon sequence of an upstream fusion gene partner and a second part corresponds to the 5′ end of an exon sequence of a downstream fusion gene partner, wherein said chimeric probes are either antisense probes oriented to hybridise to mRNA or double-stranded cDNA, or sense probes being oriented to hybridise to cDNA of the fusion genes. Thus, the term “corresponds” as used in this context refers to either the same sequence or the complementary sequence.
The microarray may comprise both antisense and sense probes for each exon-to-exon junction, i.e. it may be useful for hybridisation with both cDNA and mRNA or both strands of a PCR product.
Preferably, the chimeric probes are isothermic, i.e. they are adjusted in length to have melting temperatures (Tm value) that differs by at most 20 degrees Celsius when hybridised to a complementary DNA sequence under the conditions employed for hybridisation of the microarray. In other embodiments, the Tm values differ by at most 40 degrees Celsius 35 degrees, Celsius 30 degrees Celsius, 25 degrees Celsius, 15 degrees Celsius, and 10 degrees Celsius, respectively. Isothermic probes are favourable to enable good hybridisation conditions across the complete set of probes on the microarray.
Moreover, the first part and the second part of the chimeric probes are preferably adjusted in length to have Tm values that differs at most 10 degree Celsius under the conditions employed for hybridisation of the microarray. In other embodiments, the Tm values differ by at most 16 degrees Celsius, 14 degrees Celsius, 12 degrees Celsius, 8 degrees Celsius, 6 degrees and 4 degrees Celsius.
Adjustment of the Tm value of a probe or part of a probe may be achieved because the Tm value is dependent on the length and percentage of guanines and cytosines in the nucleotide sequence of the probe or part of the probe. It may be decided that the chimeric probes should have a Tm-value of e.g. about 68 degrees Celsius. As a start, the Tm value of a chimeric probe of 10 nucleotides for the first and the second part may be used. If the Tm value for this probe is below 68 degrees Celsius, nucleotides may be added in a balanced manner to both the first and the second part until the overall Tm value of the chimeric probe is about 68 degrees Celsius. Thus, if the first part comprises more A, T, or U nucleotides than the second part, more nucleotides will have to be added to the first part. The procedure is performed using an oligo design algorithm.
In a preferred embodiment of the invention, the Tm of the chimeric probes are above the temperature used for hybridisation and the Tm of upstream or/and downstream parts of the chimeric probes is below the temperature used for hybridisation.
The Tm value of the chimeric probe is preferably selected from the group consisting of more than 45 degrees Celsius, more than 50 degrees Celsius, more than 55 degrees Celsius, more than 60 degrees Celsius, more than 65 degrees Celsius, more than 70 degrees Celsius and more than 75 degrees Celsius.
The length of the chimeric probe is preferably selected from the group consisting of less than 60 nucleotides, less than 55 nucleotides, less than 50 nucleotides, less than 45 nucleotides, less than 40 nucleotides and less than 35 nucleotides.
In another preferred embodiment, the microarray further comprises chimeric probes targeting single nucleotide polymorphic (SNP) variants of exon-to-exon junctions. Such SNPs can be retrieved from a genome database (such as www.biomart.org) for all fusion gene partners of table 1. Where SNPs are located within a sequence flanking an exon-to-exon junction, chimeric probes including each of the SNP variants are constructed. By including the polymorphic variants of exon-to-exon junctions, it is ensured that fusion genes are not missed due to mismatches between nucleotide sequences of chimeric probes and exon-to-exon junctions.
The microarray of the invention may be purchased from several manufacturers, e.g. Agilent, Illumina, and Nimblegen. Positive signals on the microarray are typically detected by measuring fluorescence or chemiluminescence, obtained from directly or indirectly labelled nucleotides of the mRNA or cDNA from the sample.
Methods of preparing probes or oligos and methods of applying such probes to a microarray are well known to a person skilled in the art.
The scoring of the exon-to-exon junction probes is relatively straightforward. This is because the majority of the thousands of spots will be negative, and only the features with positive exon-to-exon junction probes produce a significant positive signal. Existence of a fusion gene, creating a positive signal from a chimeric probe, may be supported by corresponding shifts in the normalized longitudinal expression level profiles created by the intragenic probes of the two fusion gene partners.
To facilitate the data analysis for samples, especially for samples with unknown presence of fusion gene(s), a “fusion score” can be calculated for each possible intronic fusion breakpoint and they indicate the probability of a fusion event. Two such fusion scores can be calculated for each chimeric junction probe. These combine values from the chimeric probes with values obtained with the intragenic probes, i.e. the longitudinal profiles of either the upstream or the downstream fusion gene partner respectively. Said fusion scores are calculated using the following equation:
[Fusion score=Chimeric junction score*P(transcript-wise)*P(exon-wise)]
where the chimeric junction score is a normalised value for the chimeric probe signal, the P(transcript-wise) is the probability that the exonic expression values of the fusion gene partners are from separate populations before and after the anticipated fusion breakpoint, and the P(exon-wise) is the probability that the exonic expression values of the immediate upstream and downstream exons of the fusion gene partner are from separate populations. The term “separate populations” refers in this context to the same gene but where the gene has been fused to another gene thereby creating changes in the expression level of the individual exons of said gene.
The p(transcript-wise) and p(exon-wise) are calculated based on t-tests comparing the intragenic expression values from upstream and downstream of the possible fusion breakpoint, testing whether the longitudinal profile has a breakpoint at the given position.
The calculation of a fusion score provides an easy way to interpret the value for the probability of a fusion event at a given exon-exon junction, thereby enabling analysis and interpretation of the results by non-experts. To keep the values within scale, the following thresholds may be applied. When the normalised values for chimeric probes are larger than 10, these may be set to 10. Similarly, when probabilities for a breakpoint in the longitudinal profiles are <0.10, these values may be set to 0.10. When the values from the downstream fusion gene partner exons were lower than the values from the upstream fusion gene partner exons, the probability may also be set to 0.10.
A second aspect of the invention is a method of detecting a fusion gene comprising the steps of
In one embodiment of the present invention the method may further comprise the step of detecting the expression level of a fusion gene partner of the fusion gene using the microarray of the invention. Typically this may be performed in step c) of the above mentioned method; i.e. when the exon-to-exon junctions of the mRNA from the sample using the microarray of the invention are detected.
Thus in particular embodiment step c) may be:
c. Detecting exon-to-exon junctions of mRNAs from the sample using a microarray comprising a chimeric probe for an intergenic exon-to-exon junction of a fusion gene and a microarray comprising at least two intragenic probes for a fusion gene partner of said fusion gene.
In a further embodiment of step c) the chimeric probe and the at least two intragenic probes may be present on individual microarrays or they may be present on the same microarray.
The method of the present invention may further comprise the step of comparing the exon-to-exon junction(s) of the fusion gene detected by the chimeric probes with the exon-to-exon junction(s) detected with the intragenic probes using the microarray of the present invention.
In step c) of the method of the present invention when images from the microarray are measured, positive fusion genes may be scored by observing the following:
1. Strong intensity for a chimeric fusion gene probe is indicative of the presence of that particular fusion gene, with that particular chimeric exon-to-exon junction in the fusion transcript.
2. Additionally, from the intragenic probes we may see a difference in the normalized general gene expression levels between up- and downstream parts of the transcripts for one or both of the two fusion gene partners. Typically, there may be intragenic probes (also called longitudinal probes or oligos) for each of the included fusion gene partners which may e.g. include three intra-exon probes (oligos) per exon, and exon-to-exon junction probes (oligos). Typically, as one move from the 5′ to the 3′ end of these transcripts, a drop in the expression levels in the upstream fusion gene partner (Gene A), and an increase in the signals for the downstream fusion gene partner (Gene B) may be seen. These shifts in normalized expression levels should occur at intragenic positions that correspond to the positive intergenic/chimeric junction probe (oligo) as described in point 1.
3. Furthermore, a “fusion score” can be calculated for each chimeric junction probe as described above. The fusion score combines the scores of the chimeric fusion gene probe and the intragenic probes. This fusion score provides an easy way to express the likelihood of having a particular exon-exon junction in the fusion gene transcript.
For an RNA sample with a fusion transcript, a combination of 1 and 2 above may be seen (as illustrated in
The method may comprise preparation of cDNA from the RNA in step b) using either oligo-dT priming or random primers, such as hexamers. In this embodiment, the exon-to-exon junction is detected on the cDNA level.
The method of the present invention may also comprise labelling of the sample. Methods of labelling mRNA or cDNA are known to a person skilled in the art and include labelling of the cDNA by inclusion of e.g. Cy3 and/or Cy5-modified dNTP's as described in example 2.
Typically detection of exon-exon junctions in step c) of the method is obtained by hybridising the mRNA or cDNA obtained from the sample to the microarray. Methods of hybridising mRNA or cDNA to microarrays are well known to a person skilled in the art.
The sample may be any biological material, such as e.g. blood or bone marrow from a patient or person suspected having a cancer. Another example of a sample is tissue obtained from a solid tumour.
A particular advantage of the present invention is that it may be performed without performing RT-PCR on the RNA or PCR on cDNA obtained in step b) prior to detection of the fusion gene with a microarray.
A third aspect of the invention is a kit comprising the microarray of the invention and random primers for cDNA synthesis and/or oligo-dT primers for cDNA synthesis. Preferably, the kit further comprises a reverse transcriptase and reagents necessary for cDNA synthesis.
In a particular embodiment the kit comprises a microarray comprising a chimeric probe for an intergenic exon-to-exon junction of a fusion gene, a microarray comprising at least two intragenic probes for a fusion gene partner and random primers for cDNA synthesis and/or oligo-dT primers for cDNA synthesis.
The chimeric probe and the at least two intragenic probes of the kit may be present on individual microarrays or they may be present on the same microarray.
For generation of the junction probes (oligos), we created a computer script (written in the programming language Python) that automatically processes public genome data. For all genes, and all their transcripts, the exon sequences were retrieved. We used the www.biomart.org internet portal. For each fusion gene combination, end sequences (the last 30 nucleotides) of all GeneA exons and start-sequences (30 nt) of all GeneB exons were joined at all combinations. Next, an oligo design algorithm was used to create probes (oligos) from each of these possible fusion gene exon-to-exon junctions. We have here used Tm optimally at 68 Celsius, and with equalized Tm from each side of the junction. In our example, we have generated exon-to-exon junction probes (oligos) ranging 33 to 46 nucleotides in length.
In this way, 47427 junction probes (oligos) were designed for 275 fusion genes.
To increase the sensitivity and specificity, intragenic probes (longitudinal oligos) were also designed. These are sets of probes (oligos) measuring expression levels along the transcripts for the individual fusion gene partners. Three probes (oligos) were generated targeting internally to each exon sequence, at the start, mid, and end, and probes (oligos) were also generated targeting the intragenic exon-to-exon junctions. Exon-to-intron junctions and intron-to-exon junctions were also included as the pre-mRNA processing machinery may alter the splicing pattern following removal or introduction of cis-acting splicing regulatory sequences.
To reduce “half-binder” effects of the probes, the probes (oligos) used in our prototype were rather short in length (34-40mers), and we constructed them with equal melting temperatures on each side of the junctions. Because of the short sequences on each side of the junction, the binding may be sensitive to single nucleotide polymorphisms (SNPs). Thus, at known SNP-positions, we created extra sets of probes, accounting for each of the SNP variants. We also generated a second version of the array with longer probes (oligos) (44-55mers).
The described microarray was generated, including chimeric probes (oligos) targeting all possible junction sequences of 275 known fusion genes, and also intragenic probes (longitudinal oligos) for 100 of the genes. For seven fusion genes, including the ones included as positive control fusion genes, the chimeric probes (oligos) were included in quadruplicates. All of their belonging fusion gene partners were also among the list of 100 genes for which intragenic probes (oligos) were created. Overall, the pilot fusion gene microarray included a design with 69729 probes (oligos) which were synthesised onto Nimblegen microarray slides, which currently can contain 2.1 million different oligo sequences per microarray.
In a proof-of-principle experiment, we analysed a set of positive control samples, with known presence of one fusion gene each. The pilot samples included four prostate cancer tissue samples positive for the TMPRSS2:ERG fusion gene, and two leukaemia cell lines, each known to carry one of the TCF3:PBX1 and ETV6:RUNX1 fusion genes.
For the pilot samples, total RNA was isolated by use of Qiagen spin columns. Further, they were enriched for mRNA by a ribosomal RNA reduction kit (RiboMinus™ Transcriptome Isolation Kit; Invitrogen). From these, first strand cDNA synthesis was performed with use of random primers (hexamers), and double stranded cDNA was made and shipped to Nimblegen Inc. for labelling, hybridisation, washing, and scanning of microarrays. The cDNA was labelled by inclusion of Cy3 and Cy5-modified dNTPs.
Results
To visualize the measurements for the positive control genes, we followed two independent paths, using either of the chimeric probe set, or the intragenic (longitudinal) probe set. All six samples had clear patterns of fusion genes, and thus validating the concept.
To evaluate the variability of a given fusion gene, we used the TMPRSS2:ERG fusion gene in prostate cancer as a model. Here, we analyzed malignant prostate tissue samples from four individual tumours.
As seen in
RUNX1 is one of the most frequent targets of chromosomal rearrangements in human leukaemia. To date, 21 types of translocations involving RUNX1 have been reported, and 12 partner genes have been cloned and identified (14). One of the samples analyzed here, the REH cell line, carried an ETV6:RUNX1 fusion gene. This was detected similarly as described above for the TMPRSS2:ERG and TCF3:PBX1 genes by using chimeric exon-to-exon probes and intragenic probes targeting the exons of the ETV6 gene. The data showed that REH cell line contained an ETV6:RUNX1 fusion gene where the end of exon 5 of the ETV6 gene was fused to the beginning of exon 2 of the RUNX1 gene.
To determine our ability to detect fusion genes without prior knowledge of their presence or identity, we also performed unsupervised data analysis, in which the probability of a fusion event is calculated at all potential fusion gene junctions. For these analyses, a fusion score, calculated from the normalised value from the chimeric probe, is multiplied with probabilities of a fusion breakpoint at the up- or downstream fusion gene partners, as seen from their longitudinal profiles.
For each exon-exon junction at longitudinal profiles of the fusion partner genes, two probabilities are calculated. A transcript-wise probability is based on a t-test for whether values from all upstream and all downstream exons are likely to belong to separate populations. An exon-wise probability is based on a t-test for whether the values from the immediate up- and downstream exons are likely to belong to separate populations.
For each chimeric junction probe, two such fusion scores were calculated. These were combining values from the chimeric probes (oligos) with values from the longitudinal profiles of either the upstream or the downstream fusion gene partner.
[Fusion score=Chimeric junction score*P(transcript-wise)*P(exon-wise)]
For both the samples visualized in
To keep the values within scale, the following thresholds were applied. When the normalised values for chimeric probes were larger than 10, these were set to 10. Similarly, when probabilities for a breakpoint in the longitudinal profiles were <0.10, these values were set to 0.10. When the values from the downstream fusion gene partner exons were lower than the values from the upstream fusion gene partner exons, the probability was as well set to 0.10.
Number | Date | Country | Kind |
---|---|---|---|
PA 2007 00930 | Jun 2007 | DK | national |
07111167.8 | Jun 2007 | EP | regional |
PA 2008 00335 | Mar 2008 | DK | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2008/058272 | 6/27/2008 | WO | 00 | 3/25/2010 |