METHOD FOR DIAGNOSING A CANCER AND ASSOCIATED KIT

Information

  • Patent Application
  • 20220290242
  • Publication Number
    20220290242
  • Date Filed
    November 05, 2019
    5 years ago
  • Date Published
    September 15, 2022
    2 years ago
Abstract
The invention concerns a method for diagnosing a cancer in a subject, comprising a step of RT-MLPA on a biological sample obtained from the subject, in which the RT-MLPA step is carried out using at least one pair of probes comprising at least one probe chosen among the probes with SEQ ID NO: 1 to 13, and/or the probes with SEQ ID NO: 96 to 99, and/or the probes with SEQ ID NO: 866 to 938, and/or the probes with SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 211 to 1312, and/or the probes with SEQ ID NO: 96 to 99, and/or the probes with SEQ ID NO: 1105 to 1107 and/or the probe with SEQ ID NO: 939 and/or the probes with SEQ ID NO: 1108 to 1123, each of the probes being fused, at at least one end, with a priming sequence, and at least one of the probes of the pair comprising a molecular barcode sequence.
Description
BACKGROUND OF THE INVENTION
Field of the Invention

This invention relates to a method for diagnosing cancer and a kit useful for implementing such a method. The invention also relates to a method implemented by computer in order to analyze the results obtained after implementing this method, in particular carried out in the context of a cancer diagnosis.


Description of the Related Art

Cancers are due to an accumulation of genetic abnormalities, by tumor cells. Among these abnormalities are numerous chromosomal rearrangements (translocations, deletions, and inversions) which result in the formation of fusion genes which encode abnormal proteins. These rearrangements also lead to imbalances in the expression of exons located at 5′ and 3′ of genomic breakpoints (5′-3′ expression imbalances), the expression of the former remaining under the control of the natural transcriptional regulatory regions of the gene while that of the latter falls under the control of the transcriptional regulatory regions of the partner gene. These abnormalities also include mutations at splice sites that disrupt normal RNA maturation, resulting in particular in exon skipping. Fusion genes, exon skipping, and 5′-3′ expression imbalances, which are important diagnostic markers, are usually investigated by different techniques. Some of these genetic abnormalities are very difficult to detect/analyze, particularly those involved in the development of sarcomas, which are very heterogeneous and can involve a very large number of genes. In addition, the amounts of RNA obtained from sarcoma biopsies are often very low, of poor quality. Chromosomal rearrangements in the context of sarcomas are discussed in particular in the Nakano and Takahashi article (Int. J. Mol. Sci. 2018, 19, 3784; doi:10.3390/ijms19123784).


Fusion genes are often associated with particular forms of tumor, and their detection can significantly contribute to making the diagnosis and choosing the most suitable treatment (The impact of translocations and gene fusions on cancer causation. Mitelman F, Johansson B, Mertens F, Nat Rev Cancer. 2007 April; 7(4):233-45). They are also often used as molecular markers to monitor the efficacy of treatments and follow the course of the disease, for example in acute leukemia (Standardized RT-PCR analysis of fusion gene transcripts from chromosome aberrations in acute leukemia for detection of minimal residual disease. Report of the BIOMED-1 Concerted Action: Investigation of minimal residual disease in acute leukemia. van Dongen J J, Macintyre E A, Gabert J A, Delabesse E, Rossi V, Saglio G, Gottardi E, Rambaldi A, Dotti G, Griesinger F, Parreira A, Gameiro P, Diaz M G, Malec M, Langerak A W, San Miguel J F, Biondi A. Leukemia. 1999 December; 13(12):1901-28).


The four main techniques which are commonly used to search for fusion genes are conventional cytogenetics, molecular cytogenetics (fluorescent in situ hybridization), immunohistochemistry, and molecular genetics (RT-PCR, RNAseq, or RACE).


Conventional cytogenetics consists of establishing the karyotype of cancer cells in order to look for possible abnormalities in the number and/or structure of the chromosomes. It has the advantage of providing an overall view of the entire genome. However, it is relatively insensitive, its effectiveness being highly dependent on the percentage of tumor cells in the sample to be analyzed and on the possibility of obtaining viable cell cultures. Another of its disadvantages is its low resolution, which does not allow detecting certain rearrangements (in particular small inversions and deletions). Finally, some tumors are associated with major genomic instability which masks pathognomonic genetic abnormalities. This is the case for example in solid tumors such as lung cancer. Karyotype analysis, when possible, is therefore difficult and can only be carried out by personnel with exceptional expertise, which entails significant costs.


Molecular cytogenetics, or FISH (Fluorescent In Situ Hybridization), consists of hybridizing fluorescent probes on the chromosomes of tumor cells in order to visualize their structural abnormalities. It makes it possible to detect chromosomal rearrangements with better resolution than conventional cytogenetics, and therefore to detect rearrangements of smaller size. It also makes it possible to uncover abnormalities in tumors with high genomic instability, by precisely targeting the genes likely to be involved. Its major disadvantage is that each abnormality must be investigated individually, using specific probes. It therefore incurs significant costs, and, due to the great diversity of the abnormalities which have been described and the small amount of tumor material available for diagnosis, only a few abnormalities can be investigated. For example, in practice, in a context of diagnosing a lung carcinoma, only the rearrangement of the ALK gene is commonly investigated by this method, the search for other recurrent rearrangements in these tumors remaining highly exceptional.


Immunohistochemistry (or IHC) consists of using antibodies to investigate the overexpression of an abnormal protein. This is a simple and rapid method, but also requires searching for each abnormality individually and its specificity is often low, as certain genes can be overexpressed in a tumor without any rearrangement.


RT-PCR, RNAseq, and RACE are methods of molecular genetics carried out using RNA extracted from tumor cells. RT-PCR has excellent sensitivity, far superior to cytogenetics. This sensitivity makes it the benchmark technique for analyzing biological samples where the percentage of tumor cells is low, for example in order to monitor the effectiveness of treatments or to anticipate possible relapses very early on. Its main limitation is linked to the fact that it is extremely difficult to multiplex this type of analysis. As with molecular cytogenetics, in general each translocation must be investigated by a specific test, and only a few recurrent fusions among the very many which are currently known are therefore tested for in routine diagnostic laboratories. RT-PCR also requires having RNAs of good quality, which is rarely the case for solid tumors where, in order to facilitate pathological diagnosis, the samples are fixed in formalin and embedded in paraffin the moment the biopsy sample is obtained. This highly sensitive technique can be very useful in diagnosing a sarcoma. Nevertheless, it is necessary to perform numerous independent tests, at a minimum for the most frequent recurrent fusion genes, which incurs additional costs and lengthens the time required. RNAseq, which consists of analyzing all the RNAs expressed by the tumor by next-generation sequencing (NGS), theoretically allows detecting all abnormal fusion transcripts expressed. However, it also requires having RNAs of good quality and is therefore difficult to implement from biopsies fixed with formalin. Its application is also very complex, since many steps are required to generate the sequencing libraries. In addition, the sequencing generates a very large amount of data (since all the genes are studied) which makes the analysis particularly complex. RACE, which has recently been adapted to NGS, is a simplification of the RNAseq technique but allows targeting small panels of genes likely to be involved in fusions. It has the advantage of being able to be applied to biopsies fixed with formalin. However, although the amount of data generated is reduced compared to RNAseq, it is still significant. Unlike the method described in the present invention which only detects abnormal RNAs, RACE results in obtaining sequences which correspond to all of the targeted genes in the panel, even when they are in a germinal configuration. The vast majority of the sequences obtained therefore correspond to normal transcripts, expressed naturally by tumor cells and by the cells in their environment. The sequence files must therefore be filtered to identify the fusion transcripts. Finally, similarly to RNAseq, RACE is a long and complex technique to implement, where many steps are necessary in order to obtain the sequencing libraries, which increases the time required to deliver results.


Exon skipping generally results in the expression of an abnormally short protein which is involved in the tumor process. For example, skipping of exon 14 of the MET gene is involved in the development of lung carcinoma, and skipping of exons 2 to 7 of the EGFR gene is involved in the development of certain brain tumors, in particular glioblastoma. They are often due to point mutations which affect the exon splicing sites (3′ donor sites, 5′ acceptors, as well as intronic or exonic enhancers), or to internal deletions of genes. Today, it is particularly difficult to uncover these abnormalities in order to diagnose cancers, since neither cytogenetics nor FISH are informative. RT-PCR could be an alternative, but it is severely limited due to the formalin fixation of tumor biopsies that is necessary for pathological diagnosis. These abnormalities are therefore currently tested for primarily by next-generation sequencing of genomic DNA or of RNA, which are expensive and complex techniques.


5′-3′ expression imbalances, which require quantitatively evaluating the expression of exons, are only very rarely tested for when diagnosing a cancer. They can be analyzed either by RNAseq or by dedicated kits such as those offered by the Nanostring company (for example the “nCounter® Lung Fusion Panel” test).


International application PCT/FR2014/052255 describes a method for diagnosing cancer by detecting fusion genes. Said method comprises a RT-MLPA step using probes fused, at at least one end, with a primer sequence.


The article by Ruminy et al. describes the detection of fusion genes by RT-MLPA in the context of acute leukemia (Multiplexed targeted sequencing of recurrent fusion genes in acute leukaemia; Leukemia, 2016 March; 30(3):757-60).


The article by Piton et al. describes the detection by RT-MLPA of rearrangement linked to the ALK, ROS and RET genes in the context of lung adenocarcinomas (Ligation-dependent-RT-PCR: a new specific and low-cost technique to detect ALK, ROS and RET rearrangements in lung adenocarcinoma; Lab Invest. 2018 March; 98(3):371-379).


Techniques are therefore currently known which allow detecting fusion genes, exon skipping, or 5′-3′ expression imbalances, but they have disadvantages.


The limitations of existing methods are essentially linked to: (i) the large number of abnormalities to be tested for (this is one of the most significant limitations of IHC, FISH, and RT-PCR techniques); (ii) the sensitivity required to detect genetic abnormalities using small tumor biopsies that are fixed and embedded in paraffin (this is one of the most significant limitations of next-generation sequencing techniques); (iii) the interpretation of the results (it is necessary to define thresholds for IHC, there are significant artifacts for FISH, RNAseq and RACE generate a very large amount of data which is difficult to analyze); (iv) the implementation complexity (the large number of steps to be carried out increases the risk of error, the technical time required increases operator costs and has a strong impact on the quality of the results generated and the times required for delivery).


The method described in international application PCT/FR2014/052255 is more specific, simple, and quick to implement compared to existing techniques for detecting fusion genes.


However, there is still a need for fusion gene diagnostic techniques capable of detecting a very wide variety of abnormalities, in specific, sensitive, and reliable ways, while remaining simple and quick to implement.


International application PCT/FR2014/052255 also describes specific probes for types of translocation observed in cancers. However, new genetic abnormalities have since been uncovered and cannot be detected by the method described in the international application referenced above.


There is therefore a need for a diagnostic method which allows detecting new genetic abnormalities.


Furthermore, the techniques which currently make it possible to detect exon skipping require performing complex additional tests. These techniques are therefore expensive, long to implement, and difficult to interpret.


There is therefore a need for a technique which allows detecting exon skipping that is sensitive, reliable, simple, economical, and quick to implement.


There is also a need for a technique which allows detecting 5′-3′ expression imbalances which is sensitive, reliable, simple, economical, and quick to implement.


As the techniques for detecting fusion genes, exon skipping, and 5′-3′ expression imbalances are different, there is also a need for a method that allows detecting these three types of genetic abnormalities simultaneously.


Finally, as the surgical tumor biopsies available for the diagnosis of solid cancers are often very small, fixed in formalin, and embedded in paraffin, there is a need for a method that allows detecting a large number of abnormalities simultaneously, in a small amount of low-quality genetic material.


SUMMARY OF THE INVENTION

The invention thus aims to meet these different needs. The invention is in fact based on the results of the Inventors who (i) have identified new genetic abnormalities linked to the RET, MET, ALK, and/or ROS genes in carcinomas (both fusion genes and exon skipping), and (ii) have developed a technique to identify them. The invention is also based on (iii) the results of the inventors which have identified new probes, in particular which allow diagnosing sarcomas, brain tumors, gynecological tumors, or tumors of the head and neck, or (iv) 5′-3′ imbalances (for example 5′-3′ imbalances of the ALK gene). The invention is also based on (v) the use of probes comprising at least one molecular barcode, which makes it possible to significantly improve the sensitivity and specificity of the detection.


The invention thus provides a method which makes it possible to simultaneously detect fusion genes, exon skipping, and 5′-3′ expression imbalances. The invention also has the advantage of being specific, sensitive, reliable, but also simple, economical, and quick to implement. Typically, by means of the technique according to the invention, the results can be obtained within two or three days after the sample is received by the analysis laboratory, compared to several weeks for conventional techniques. It also offers the advantage of being applicable to fixed tissues, such as those used in pathology laboratories. The invention thus makes it possible to identify genetic abnormalities from a small amount of poor-quality genetic material. Finally, its very high sensitivity (it allows detecting less than ten abnormal molecules in a sample), coupled with its very high specificity (the results obtained are DNA sequences, meaning qualitative data, which does not induce interpretation bias the way quantitative IHC-type methods can), make this a very efficient method. The invention thus makes it possible to have a treatment plan adapted to each patient. Indeed, the invention makes it possible to diagnose with accuracy and to guide the choice of treatment by identifying patients eligible for targeted treatments.


DESCRIPTION OF THE PREFERRED EMBODIMENTS

In a first aspect, the invention thus relates to a method for diagnosing cancer in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject, wherein the RT-MLPA step is carried out using at least one pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 1 to 13, and/or
    • the probes SEQ ID NO: 96 to 99,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In this first aspect, the invention also relates to a method for diagnosing cancer in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject, wherein the RT-MLPA step is carried out using at least one pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or
    • the probes SEQ ID NO: 1105 to 1107, and/or SEQ ID NO: 939, and/or
    • the probes SEQ ID NO: 1108 to 1123,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In this first aspect, the invention also relates to a method for diagnosing cancer in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject, wherein the RT-MLPA step is carried out using at least one pair of probes comprising at least one probe selected from the probes SEQ ID NO: 1211 to 1312,


each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a first aspect, the invention thus relates to a method for diagnosing cancer in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject, wherein the RT-MLPA step is carried out using at least one pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 1 to 13, and/or 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, and/or
    • the probes SEQ ID NO: 96 to 99, and/or SEQ ID NO: 1105 to 1107, and/or SEQ ID NO: 939, and/or
    • the probes SEQ ID NO: 1108 to 1123,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


According to the invention, the term “MLPA” means Multiplex Ligation-Dependent Probe Amplification, which allows the simultaneous amplification of several targets of interest that are adjacent to one another, using one or more specific probes. In the context of the invention, this technique is very advantageous for determining the presence of translocations, which are frequent in malignant tumors.


According to the invention, the term “RT-MLPA” means Multiplex Ligation-Dependent Probe Amplification preceded by a Reverse Transcription (RT), which, in the context of the invention, allows starting with the RNA from a subject to amplify and characterize fusion genes, exon skippings of interest, and/or 5′-3′ expression imbalances. According to the invention, the RT-MLPA step is carried out in multiplex mode. The multiplex mode saves time because it is faster than several monoplex assays, and is economically advantageous. It also makes it possible to simultaneously search for a much higher number of abnormalities than the other techniques currently available. The RT-MLPA step is derived from MLPA, described in particular in U.S. Pat. No. 6,955,901. It allows the detection and simultaneous assay of a large number of different oligonucleotide sequences. The principle is as follows (see FIG. 1 which illustrates the principle with a fusion gene): the RNA extracted from tumor tissue is first converted into complementary DNA (cDNA) by reverse transcription. This cDNA is then incubated with the mixture of appropriate probes, each of which can then hybridize to the sequences of the exons to which they correspond. If one of the fusion transcripts or one of the transcripts corresponding to a searched-for exon skipping is present in the sample, two probes attach side by side to the corresponding cDNA. A ligation reaction is then carried out using an enzyme with DNA ligase activity, which establishes a covalent bond between the two adjacent probes. A PCR (Polymerase Chain Reaction) reaction is then carried out, using primers corresponding to the primer sequences, which makes it possible to specifically amplify the two ligated probes. Obtaining an amplification product after the RT-MLPA step indicates that one of the translocations or an exon skipping being searched for is present in the analyzed sample. Sequencing this amplification product allows identifying the genes involved.


According to the invention, the term “subject” means an individual who is healthy or is likely to be affected by cancer or is seeking screening, diagnosis, or follow-up.


According to the invention, the term “biological sample” means a sample containing biological material. More preferably, it means any sample containing RNA. This sample may come from a biological sample taken from a living being (human patient, animal). Preferably, the biological samples of the invention are selected among blood and a biopsy, obtained from a subject, in particular a human subject. The biopsy is in particular tumoral, in particular from a section of fixed tissue (for example fixed with formalin and/or embedded in paraffin) or from a frozen sample.


According to the invention, the term “cancer” means a disease characterized by abnormally high cell proliferation within normal tissue of the organism, such that the survival of the organism is threatened. In a preferred embodiment of the method according to the invention, the cancer is linked to a genetic abnormality, preferably the formation of a fusion gene and/or an exon skipping and/or a 5′-3′ imbalance. In a preferred embodiment of the method according to the invention, the cancer is linked to a genetic abnormality, preferably a fusion gene or an exon skipping. In a preferred embodiment of the method according to the invention, the cancer involves at least one gene selected among RET, MET, ALK and/or ROS, and in particular is associated with the formation of a fusion gene and/or an exon skipping, more particularly a skipping of an exon of the MET gene and/or a 5′-3 imbalance, more particularly a 5′-3′ imbalance of the ALK gene. According to the invention, and in a first aspect, the cancer is preferably a carcinoma. Carcinomas are malignant tumors that develop at the expense of epithelial tissue. More particularly, the cancer is a lung carcinoma, more particularly a bronchopulmonary carcinoma, even more particularly a lung carcinoma associated with a genetic abnormality of the RET, MET, ALK and/or ROS genes. In another preferred embodiment of the method according to the invention, the 5′-3′ expression imbalance is more particularly understood to mean an expression imbalance of the ALK gene. According to another aspect of the invention, and in a second aspect, the cancer is preferably a sarcoma, a brain tumor, a gynecological tumor, or a tumor of the head and neck. Sarcomas are tumors of the soft tissue and bone. Brain tumors are tumors that grow in the brain, such as gliomas or medulloblastomas. Gynecologic tumors are tumors of the female reproductive system, such as cervical cancer, endometrial cancer, and ovarian cancer. Cancers of the head and neck are cancers of the upper respiratory tract, such as squamous cell carcinoma of the throat (larynx, pharynx) and mouth, cancer of the cavum (or nasopharynx), cancer of the salivary glands (parotid, palate), or cancer of the thyroid gland. In another preferred embodiment of the method according to the invention, exon skipping also means a skipping of an exon of the EGFR gene, and more particularly a skipping of exons 2 to 7 of the EGFR gene. Thus, according to the invention, exon skipping is understood to mean a skipping of an exon or exons of the MET and/or EGFR gene.


According to the invention, the term “probe” means a nucleic acid sequence of a length between 15 and 55 nucleotides, preferably between 15 and 45 nucleotides, and complementary to a cDNA sequence derived from RNA of the subject (endogenous). It is therefore capable of hybridizing with said cDNA sequence derived from RNA of the subject. The term “pair of probes” means a set of two probes (i.e. a “Left” probe and a “Right” probe): one located at 5′ (see in particular “L” in Table 1) of the translocation of the fusion gene, of the skipping of an exon or exons whose expression is evaluated in order to detect a 5′-3′ expression imbalance, the other located at 3′ (see in particular “R” in Table 1) of the translocation of the fusion gene, of the skipping of an exon or exons whose expression is evaluated in order to detect a 5′-3′ expression imbalance. Preferably, said pair of probes consists of two probes hybridizing side by side during the RT-MLPA step. Preferably, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 1 to 13, and/or probes of SEQ ID NO: 96 to 99 and/or probes of SEQ ID NO: 14 to 91. Even more particularly, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 1 to 13, of probes of SEQ ID NO: 96 to 99 and of probes of SEQ ID NO: 14 to 91. Preferably, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 866 to 938, and/or probes of SEQ ID NO: 940 to 1104, and/or probes of SEQ ID NO: 1105 to 1107, and/or SEQ ID NO: 939, and/or probes SEQ ID NO: 1108 to 1123. Even more particularly, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 866 to 938, probes of SEQ ID NO: 940 to 1104, probes of SEQ ID NO: 1105 to 1107, the probe of SEQ ID NO: 939 and probes SEQ ID NO: 1108 to 1123. Preferably, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 1211 to 1312. Even more particularly, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 1 to 13, probes of SEQ ID NO: 96 to 99, probes of SEQ ID NO: 14 to 91, probes of SEQ ID NO: 866 to 938, probes of SEQ ID NO: 940 to 1104, probes of SEQ ID NO: 1105 to 1107, the probe of SEQ ID NO: 939, and probes of SEQ ID NO: 1108 to 1123. Even more particularly, a pair of probes according to the invention is formed at least of probes of SEQ ID NO: 1 to 13, probes of SEQ ID NO: 96 to 99, probes of SEQ ID NO: 14 to 91, probes of SEQ ID NO: 866 to 938, probes of SEQ ID NO: 940 to 1104, probes of SEQ ID NO: 1105 to 1107, the probe of SEQ ID NO: 939, and probes of SEQ ID NO: 1108 to 1123 and probes of SEQ ID NO: 1211 to 1312.


According to the invention, the term “primer sequence” means a nucleic acid sequence of a length between 15 and 30 nucleotides, preferably between 19 and 25 nucleotides, and not complementary to the cDNA sequences obtained from RNA of the subject. It is therefore not complementary to the cDNA corresponding to endogenous RNA. It therefore cannot hybridize with said cDNA sequences. Preferably, in a preferred embodiment of the method according to the invention, the primer sequence is selected from the (pairs of) sequences SEQ ID NO: 92 and SEQ ID NO: 93 or SEQ ID NO: 94 and SEQ ID NO: 95.


According to the invention, the term “index sequence” means a nucleic acid sequence of a length between 5 and 10 nucleotides, preferably between 6 and 8 nucleotides, in particular 8 nucleotides, and not complementary to the sequences of cDNA obtained from RNA of the subject. It is therefore not complementary to the cDNA corresponding to endogenous RNA. It therefore cannot hybridize with said cDNA sequences. Preferably, the index sequence is represented by the sequence SEQ ID NO: 836. Said index sequence is composed of bases (A, T, G, or C). In a preferred embodiment of the method according to the invention, said index sequence can be fused to a primer sequence, in particular at the 3′ end of the primer sequence. The index sequence is specific to each subject/patient whose sample is tested. Each pair of probes used in the PCR step comprises a different index sequence which allows identifying the sequences linked to each of the patients analyzed.


According to the invention, the term “molecular barcode” means a nucleic acid sequence of length between 5 and 10 nucleotides, preferably between 6 and 8 nucleotides, in particular 7 nucleotides, and not complementary to the cDNA sequences from RNA of the subject. It is therefore not complementary to the cDNA corresponding to endogenous RNA. It therefore cannot hybridize with said cDNA sequences. Preferably, the molecular barcode sequence is represented by the sequence SEQ ID NO: 100. Said molecular barcode sequence is a random sequence, composed of random bases (A, T, G, or C). The use of this sequence provides information on the exact number of cDNA molecules detected by ligation, while avoiding the bias associated with PCR amplification. According to the invention, at least one of the probes of said pair comprises a molecular barcode sequence. In other words, at least one of the probes of said pair is fused at one end with a molecular barcode sequence. In an embodiment that is preferred, and particularly preferred, a molecular barcode sequence is added at 5′ of the “F” or “Forward” probe, also called “L” or “Left”. In a preferred embodiment, each of the probes can comprise a molecular barcode sequence, in particular the probes SEQ ID NO: 14 to 91 and the probes SEQ ID NO: 96 and 98, preferably the probes SEQ ID NO: 14 to 91.


According to the invention, the term “extension sequence” refers to the sequences which can be present at the ends of the primers used during the PCR step, and which allow analysis of the PCR products on an Illumina-type next-generation sequencer. An “extension” sequence corresponds to any suitable sequence enabling analysis of the PCR products on a next-generation sequencer. An extension sequence is a nucleic acid sequence of a length between 5 and 20 nucleotides, preferably between 5 and 15 nucleotides, and not complementary to the cDNA sequences derived from RNA from the subject. It is therefore not complementary to the cDNA corresponding to endogenous RNA. It therefore cannot hybridize with said cDNA sequences. It is in particular represented by SEQ ID NO: 865. The knowledge of persons skilled in the art easily allows them to adapt these extension sequences.


According to the invention, the term “sensitivity” means the proportion of positive tests in subjects suffering from cancer and actually carrying the searched-for abnormalities (calculated by the following formula: number of true positives/(number of true positives plus number of false negatives)).


According to the invention, the term “specificity” means the proportion of negative tests in subjects not suffering from cancer and not carrying the searched-for abnormalities (calculated by the following formula: number of true negatives/(number of true negatives plus number of false positives)).


The inventors of the invention have identified specific probes for new genetic abnormalities observed in certain cancers. This identification is based on analysis of the intron/exon structure of genes involved in translocations, as shown in FIG. 1, or exon skippings, as shown in FIG. 2 or FIG. 9, or even 5′-3′ expression imbalances as shown in FIG. 13. In particular, with regard to FIG. 1, the breakpoints likely to lead to expression of functional chimeric proteins are searched for (FIG. 1A). From these results, DNA sequences of 25 to 50 base pairs are defined, which exactly correspond to the 5′ and 3′ ends of the exons of the two juxtaposed genes after splicing the hybrid transcripts (FIG. 1A). A set of probes is then defined as follows: a primer sequence (SA in FIG. 1B) of about twenty base pairs, is added at 5′ of all the probes complementary to the exons of the genes forming the 5′ part of the fusion transcripts (S1 in FIG. 1B). A second primer sequence (SB in FIG. 1B), also about twenty base pairs but different from SA, is added to the 3′ ends of all the probes complementary to the exons of the genes forming the 3′ part of the fusion transcripts (S2 in FIG. 1B). At least one molecular barcode sequence (SA′ in FIG. 1B) is added, for example at 5′ of the probe complementary to the exons of the genes forming the 5′ part of the fusion transcripts. These probes are then grouped together in a mixture, and contain all the elements necessary for the detection of one or more fusion transcripts, produced by one or more translocations. The probes used in the invention are therefore capable of hybridizing either with the last nucleotides of the last exon at 5′ of the translocation, or with the first nucleotides of the first exon at 3′ of the translocation. Preferably, the probes used according to the invention, capable of hybridizing with the first nucleotides of the first exon at 3′ of the translocation, are phosphorylated at 5′ before their use. The same principle applies when the genetic abnormality is an exon skipping. FIG. 2 represents the strategy which allows detecting a skipping of exon 14 of the MET gene, by means of the invention. FIG. 2A shows that in a normal situation, the splicing of the transcripts of the MET gene induces junctions between exons 13 and 14, and 14 and 15. In a pathological situation, for example if a mutation destroys the splice donor site of exon 14, the tumor cells express an abnormal transcript, resulting from the junction of exons 13 and 15. A set of probes is thus defined as follows: a primer sequence (SA in FIG. 2B) of about twenty base pairs, is added at 5′ of all probes complementary to the exon 13 forming the 5′ part of the fusion transcripts (S13L in FIG. 2B). A second primer sequence (SB in FIG. 2B), also about twenty base pairs but different from SA, is added to the 3′ ends of all probes complementary to the exon 15 forming the 3′ part of the fusion transcripts (S15R in FIG. 2B). At least one molecular barcode sequence (SA′ in FIG. 2B) is added, for example at 5′ of the probe complementary to the exons forming the 5′ part of the exon skipping, in particular exon 13 of the MET gene. The same principle applies for the skipping of exons 2 to 7 of the EGFR gene, which is often due to an internal deletion of the gene at the genomic DNA level and which results in the loss of these exons.


According to the invention, at least one of the probes of a pair used comprises a molecular barcode sequence, in particular the “L” probe. This means that the molecular barcode sequence is fused to the probe sequence at one of its ends, preferably 5′. When it is present, said molecular barcode sequence is preferably inserted between the primer sequence and the probe complementary to the exons of the genes. According to the invention, a preferred embodiment may also comprise a primer sequence at 5′ of a molecular barcode sequence, said barcode sequence itself being added at 5′ of the probe complementary to the exon of the gene forming the 5′ part of the fusion transcripts or of the transcript corresponding to an exon skipping, optionally 5′-3′ expression imbalances. According to the invention, an alternative embodiment may also comprise a primer sequence added to the 3′ end of a molecular barcode sequence, said barcode sequence itself being added at 3′ of the probe complementary to the exon of the gene forming the 3′ part of the fusion transcripts or of the transcript corresponding to an exon skipping, optionally 5′-3′ expression imbalances. According to the invention, one particular embodiment can thus comprise a primer sequence at 5′ of a molecular barcode sequence, said barcode sequence itself being added at 5′ of the probe complementary to the exon of the gene forming the 5′ part of the fusion transcripts or of the transcript corresponding to an exon skipping, optionally 5′-3′ expression imbalances, as well as a primer sequence added to the 3′ end of a molecular barcode sequence, said barcode sequence itself being added at 3′ of the probe complementary to the exon of the gene forming the 3′ part of the fusion transcripts or of the transcript corresponding to an exon skipping, optionally 5′-3′ expression imbalances.


An example of the various translocations (fusion genes) identified according to the invention is illustrated in FIG. 4. An example of exon skipping identified according to the invention is illustrated in FIG. 2 or FIG. 9. An example of a 5′-3′ imbalance is illustrated in FIG. 13. Example 6 also illustrates fusions associated with pathologies.


In a preferred embodiment of the method according to the invention, the probes SEQ ID NO: 14 to 91 are also used for the RT-MLPA step. In this aspect, each of the probes is also fused, at at least one end, with a primer sequence, and at least one of the probes preferably comprises a molecular barcode sequence. According to an even more particular embodiment, each of the “L” probes of the pair comprises a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes each comprising a probe selected from probes SEQ ID NO: 1 to 13, optionally probes SEQ ID NO: 14 to 91, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes each comprising a probe selected from probes SEQ ID NO: 96 to 99, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes each comprising a probe selected from probes SEQ ID NO: 1 to 13 and probes SEQ ID NO: 96 to 99, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 1 to 13, probes SEQ ID NO: 96 to 99, and probes SEQ ID NO: 14 to 91, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence, in particular probes SEQ ID NO: 14 to 91 and optionally probes SEQ ID NO: 96 and 98.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 866 to 938 and SEQ ID NO: 940-1104, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 1211 to 1312, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 1105 to 1107 and SEQ ID NO: 939, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 1108 to 1123, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or probes SEQ ID NO: 1105 to 1107, and/or SEQ ID NO: 939, and/or SEQ ID NO: 1108 to 1123, each of probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes comprising the probes selected from probes SEQ ID NO: 866 to 938, SEQ ID NO: 940 to 1104, SEQ ID NO: 1105 to 1107, SEQ ID NO: 939, SEQ ID NO: 1108 to 1123, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes each comprising the probes selected from probes SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 826 to 835, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824 and SEQ ID NO: 825, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes each comprising the probes selected from probes SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 826 to 835, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824, SEQ ID NO: 825, SEQ ID NO: 866 to 938, SEQ ID NO: 940 to 1104, SEQ ID NO: 1105 to 1107, SEQ ID NO: 939, and SEQ ID NO: 1108 to 1123, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the RT-MLPA step is carried out using pairs of probes each comprising the probes selected from probes SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 826 to 835, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824, SEQ ID NO: 825, SEQ ID NO:866 to 938, SEQ ID NO: 940 to 1104, SEQ ID NO: 1105 to 1107, SEQ ID NO: 939, SEQ ID NO: 1108 to 1123, and SEQ ID NO: 1211 to 1312, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the cancer associated with the formation of a fusion gene is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1 to 13, optionally probes SEQ ID NO: 14 to 91, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the cancer associated with the formation of a fusion gene is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 866 to 938 and/or SEQ ID NO: 940 to 1104, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the cancer associated with the formation of a fusion gene is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1211 to 1312, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, the cancer associated with the formation of a fusion gene is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1 to 13, and/or SEQ ID NO: 14 to 91, and/or SEQ ID NO: 866 to 938 and/or SEQ ID NO: 940 to 1104, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence. Preferably, all the probes of SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 868 to 938, and SEQ ID NO: 940 to 1104 are used.


In a preferred embodiment of the method according to the invention, the cancer associated with the formation of a fusion gene is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1 to 13, and/or SEQ ID NO: 14 to 91, and/or SEQ ID NO: 866 to 938 and/or SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence. Preferably, all the probes of SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 868 to 938, SEQ ID NO: 940 to 1104 and SEQ ID NO: 1211 to 1312 are used.


Alternatively and in another preferred embodiment of the method according to the invention, the cancer associated with an exon skipping is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 96 to 99, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 94 and SEQ ID NO: 95, and optionally at least one of the probes of said pair comprises a molecular barcode sequence. More particularly according to this embodiment, the cancer is associated with a skipping of an exon of the MET gene, more particularly a skipping of exon 14 of the MET gene.


Alternatively and in another preferred embodiment of the method according to the invention, the cancer associated with an exon skipping is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 94 and SEQ ID NO: 95, and optionally at least one of the probes of said pair comprises a molecular barcode sequence. More particularly according to this embodiment, the cancer is associated with a skipping of exons of the EGFR gene, more particularly a skipping of exons 2 to 7 of the EGFR gene.


Alternatively and in another preferred embodiment of the method according to the invention, the cancer associated with an exon skipping is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 96 to 99, and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939, and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 94 and SEQ ID NO: 95, and optionally at least one of the probes of said pair comprises a molecular barcode sequence. Preferably, all the probes SEQ ID NO: 96 to 99, SEQ ID NO: 1105 to 1107 and SEQ ID NO: 939 are used.


Alternatively and in another preferred embodiment of the method according to the invention, the cancer associated with a 5′-3′ imbalance is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1108 to 1123 and each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 94 and SEQ ID NO: 95, and optionally at least one of the probes of said pair comprises a molecular barcode sequence. Preferably, all the probes SEQ ID NO: 1108 to 1123 are used.


In a preferred embodiment, the invention thus relates to a method for diagnosing a carcinoma in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1 to 13, optionally probes SEQ ID NO: 14 to 91, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a carcinoma in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1294 to 1312, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a carcinoma in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1 to 13, and probes SEQ ID NO: 1294 to 1312, optionally probes SEQ ID NO: 14 to 91, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a sarcoma in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 866 to 938 and probes SEQ ID NO: 940 to 1054, optionally SEQ ID NO: 1148, and/or SEQ ID NO: 1149, and/or SEQ ID NO: 1178 and/or SEQ ID NO: 1179, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a sarcoma in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1228 to 1291, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a sarcoma in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 866 to 938 and probes SEQ ID NO: 940 to 1054, and probes SEQ ID NO: 1228 to 1291, optionally SEQ ID NO: 1148, and/or SEQ ID NO: 1149, and/or SEQ ID NO: 1178 and/or SEQ ID NO: 1179, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a tumor of the head and neck in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 866 to 938 and probes SEQ ID NO: 940 to 1054, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a tumor of the head and neck in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1211 to 1227, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a tumor of the head and neck in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 866 to 938 and probes SEQ ID NO: 940 to 1054 and probes SEQ ID NO: 1211 to 1227, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a gynecological tumor in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 866 to 938 and probes SEQ ID NO: 940 to 1054, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a brain tumor in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1040 to 1104, optionally probes of SEQ ID NO: 124-125, SEQ ID NO: 456, SEQ ID NO: 1209-1210, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a brain tumor in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1292 to 1293, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment, the invention thus relates to a method for diagnosing a brain tumor in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject with at least probes SEQ ID NO: 1040 to 1104 and probes SEQ ID NO: 1292 to 1293, optionally the probes of SEQ ID NO: 124-125, SEQ ID NO: 456, SEQ ID NO: 1209-1210, each of the probes being fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and at least one of the probes of said pair comprises a molecular barcode sequence.


In a preferred embodiment of the method according to the invention, said RT-MLPA step comprises at least the following steps:


a) extraction of RNA from the biological sample from the subject,


b) conversion of the RNA extracted in a) into cDNA by reverse transcription,


c) incubation of the cDNA obtained in b) with a pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 1 to 13, and/or
    • the probes SEQ ID NO: 96 to 99,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence,


      d) addition of a DNA ligase to the mixture obtained in c), in order to establish a covalent bond between two adjacent probes,


      e) PCR amplification of the covalently bound adjacent probes obtained in d), in order to obtain amplicons.


In a preferred embodiment of the method according to the invention, said RT-MLPA step also comprises at least the following steps:


a) extraction of RNA from the biological sample from the subject,


b) conversion of the RNA extracted in a) into cDNA by reverse transcription,


c) incubation of the cDNA obtained in b) with a pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or
    • the probes SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939, and/or
    • the probes SEQ ID NO: 1108 to 1123,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence,


      d) addition of a DNA ligase to the mixture obtained in c), in order to establish a covalent bond between two adjacent probes,


      e) PCR amplification of the covalently bound adjacent probes obtained in d), in order to obtain amplicons.


In a preferred embodiment of the method according to the invention, said RT-MLPA step also comprises at least the following steps:


a) extraction of RNA from the biological sample from the subject,


b) conversion of the RNA extracted in a) into cDNA by reverse transcription,


c) incubation of the cDNA obtained in b) with a pair of probes comprising at least one probe selected from the probes SEQ ID NO: 1211 to 1312,


each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence,


d) addition of a DNA ligase to the mixture obtained in c), in order to establish a covalent bond between two adjacent probes,


e) PCR amplification of the covalently bound adjacent probes obtained in d), in order to obtain amplicons.


In a preferred embodiment of the method according to the invention, said RT-MLPA step comprises at least the following steps:


a) extraction of RNA from the biological sample from the subject,


b) conversion of the RNA extracted in a) into cDNA by reverse transcription,


c) incubation of the cDNA obtained in b) with a pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 1 to 13, and/or SEQ ID NO: 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or
    • the probes SEQ ID NO: 96 to 99, and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939,
    • the probes SEQ ID NO: 1108 to 1123,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence,


      d) addition of a DNA ligase to the mixture obtained in c), in order to establish a covalent bond between two adjacent probes,


      e) PCR amplification of the covalently bound adjacent probes obtained in d), in order to obtain amplicons.


In a preferred embodiment of the method according to the invention, said RT-MLPA step comprises at least the following steps:


a) extraction of RNA from the biological sample from the subject,


b) conversion of the RNA extracted in a) into cDNA by reverse transcription,


c) incubation of the cDNA obtained in b) with a pair of probes comprising at least one probe selected from:

    • the probes SEQ ID NO: 1 to 13, and/or SEQ ID NO: 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, and/or
    • the probes SEQ ID NO: 96 to 99, and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939,
    • the probes SEQ ID NO: 1108 to 1123,


      each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence,


      d) addition of a DNA ligase to the mixture obtained in c), in order to establish a covalent bond between two adjacent probes,


      e) PCR amplification of the covalently bound adjacent probes obtained in d), in order to obtain amplicons.


Typically, the extraction of RNA from the biological sample according to step a) is carried out according to conventional techniques, well known to those skilled in the art. For example, this extraction can be carried out by cell lysis of the cells obtained from the biological sample. This lysis may be chemical, physical or thermal. This cell lysis is generally followed by a purification step which allows separating the nucleic acids from other cellular debris and concentrating them. For the implementation of step a), commercial kits of the QIAGEN and Zymo Research type, or those marketed by Invitrogen, can be used. Of course, the relevant techniques differ depending on the nature of the biological sample tested. The knowledge of the person skilled in the art will allow said person to easily adapt these steps of lysis and purification to said biological sample tested.


Preferably, the RNA extracted in step a) is then converted by reverse transcription into cDNA; this is step b) (see FIG. 1B). This step b) can be carried out using any reverse transcription technique known from the prior art. It can in particular be carried out using the reverse transcriptase marketed by Qiagen, Promega, or Ambion, according to the standard conditions of use, or alternatively using M-MLV Reverse Transcriptase from Invitrogen.


Preferably, the cDNA obtained in step b) is then incubated with at least the probes SEQ ID NO: 1 to 13 and/or SEQ ID NO: 96 to 99, preferably also the probes SEQ ID NO: 14 to 91, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence, preferably the probes of SEQ ID NO: 14 to 91 and optionally the probes of SEQ ID NO: 96 and 98. This is the probe hybridization step c) (see FIG. 1B). Indeed, the probes which are complementary to a portion of cDNA will hybridize with this portion if the portion is present in the cDNA. As shown in FIG. 1B, due to their sequence, the probes will therefore hybridize:

    • either with the portion of cDNA corresponding to the last nucleotides of the last 5′ exon of the translocation. These are then probes that are also called “L” or “Left”;
    • or with the portion of cDNA corresponding to the first nucleotides of the first 3′ exon of the translocation. These are then probes that are also called “R” or “Right”.


Preferably, the cDNA obtained in step b) is then incubated with at least the probes SEQ ID NO: 866 to 938 and/or SEQ ID NO: 940 to 1104 and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939 and/or SEQ ID NO: 1108 to 1123, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence. This is probe hybridization step c) (see FIG. 1B). Indeed, the probes which are complementary to a portion of cDNA will hybridize with this portion if the portion is present in the cDNA. As shown in FIG. 1B, due to their sequence, the probes will therefore hybridize:

    • either with the portion of cDNA corresponding to the last nucleotides of the last 5′ exon of the translocation. These are then “L” or “Left” probes;
    • or with the portion of cDNA corresponding to the first nucleotides of the first 3′ exon of the translocation. These are then also “R” or “Right” probes.


Preferably, the cDNA obtained in step b) is then incubated with at least the probes SEQ ID NO: 1211 to 1312, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair comprising a molecular barcode sequence. This is probe hybridization step c) (see FIG. 1B). Indeed, the probes which are complementary to a portion of cDNA will hybridize with this portion if the portion is present in the cDNA. As shown in FIG. 1B, due to their sequence, the probes will therefore hybridize:

    • either with the portion of cDNA corresponding to the last nucleotides of the last 5′ exon of the translocation. These are then “L” or “Left” probes;
    • or with the portion of cDNA corresponding to the first nucleotides of the first 3′ exon of the translocation. These are then also “R” or “Right” probes.


Preferably, the probes SEQ ID NO: 1 to 13, 97 and 99 are “R” probes and the probes SEQ ID NO: 96 and 98 are “L” probes, as are the probes SEQ ID NO: 14 to 91.


Preferably, the probes SEQ ID NO: 870-873, 877-878, 882, 889-892, 894-895, 901-902, 912-914, 920-921, 924-926, 930, 937, 939, 943, 946, 950-968, 970-971, 973-983, 988, 991-994, 997-998, 1000, 1002-1004, 1007, 1009-1010, 1017, 1021, 1022, 1035-1040, 1042-1043, 1048-1054, 1056-1059, 1063, 1065, 1067-1068, 1070, 1079-1081, 1088-1089, 1092, 1094, 1096, 1099-1102, 1104, 1106, 1109, 1111, 1113, 1115, 1117, 1119, 1121, 1123 are “R” probes, and the probes SEQ ID NO: 866-869, 874-876, 879-881, 883-888, 893, 896-900, 903-911, 915-919, 922-923, 927-929, 931-936, 938, 940-942, 944-945, 947-949, 969, 972, 984-987, 989-990, 995-996, 999, 1001, 1005-1006, 1008, 1011-1016, 1018-1020, 1023-1034, 1041, 1044-1047, 1055, 1060-1062, 1064, 1066, 1069, 1071-1078, 1082-1087, 1090-1091, 1093, 1095, 1097-1098, 1103, 1105, 1107-1108, 1110, 1112, 1114, 1116, 1118, 1120, 1122 are “L” probes.


Preferably, the probes SEQ ID NO: 1211, 1214, 1215, 1216, 1217, 1222, 1224, 1227, 1230, 1235, 1237, 1239, 1242, 1245, 1248-1249, 1251, 1253, 1260-1265, 1269-1270, 1272, 1273, 1278, 1280, 1282, 1284-1288, 1290, 1295, 1299, 1303-1305, 1310-1312 are “R” probes, and the probes SEQ ID NO: 1212, 1213, 1218-1221, 1223, 1225-1226, 1228-1229, 1231-1234, 1236, 1238, 1240-1241, 1243-1244, 1246-1247, 1250, 1252, 1254-1259, 1266-1268, 1271, 1274-1277, 127, 1281, 1283, 128, 1291-1294, 1296-1298, 1300-1302, 1306-1309 are “L” probes.


At the end of step c), the probes hybridized to the cDNA are adjacent, if and only if the translocation (fusion gene) or the exon skipping has taken place. This step c) is typically carried out by incubating the cDNA and the mixture of probes at a temperature of between 90° C. and 100° C. in order to denature the secondary structures of the nucleic acids, for a period of 1 to 5 minutes, then leaving this to incubate for a period of at least 30 minutes, preferably 1 hour, at a temperature of about 60° C. to allow hybridization of the probes. This can be carried out using the commercial kit sold by the MRC-Holland company (SALSA MLPA Buffer) or using a buffer offered by the NEB company (Buffer U).


At the end of step c), a DNA ligase is typically added in order to covalently bind only the adjacent probes; this is step d) (see FIGS. 1B and 2B). The DNA ligase is in particular ligase 65, sold by MRC-Holland, Amsterdam, Netherlands (SALSA Ligase-65), or the thermostable ligases (Hifi Taq DNA Ligase or Taq DNA ligase) sold by the NEB company. It is typically carried out at a temperature between 50° C. and 60° C., for a period of 10 to 20 minutes, then for a period of 2 to 10 minutes at a temperature between 95° C. and 100° C.


At the end of step d), each pair of adjacent probes L and R is covalently bound, and the primer sequence of each probe is still present in 5′ and 3′, as well as the molecular barcode sequence.


Preferably, the method also comprises a step e) of PCR amplification of the adjacent covalently bound probes obtained in d) (see FIGS. 1B and 2B). This PCR step is done using a pair of primers, one of the primers being identical to the 5′ primer sequence, the other primer being complementary to the 3′ primer sequence. Preferably, the PCR amplification of step e) is carried out using the pair of primers SEQ ID NO: 101 and 92 to detect fusion genes, or the pair of primers SEQ ID NO: 102 and 94 to detect skipping of exons of the MET and EGFR genes.


PCR is typically carried out using commercial kits, such as the ready-to-use kits sold by Eurogentec (Red′y′Star Mix) or NEB (Q5 High fidelity DNA polymerase). Typically, the PCR takes place with a first phase of initial denaturation at a temperature between 90° C. and 100° C., typically around 94° C., for a time of 5 to 8 minutes; then a second phase of amplification comprising several cycles, typically 35 cycles, each cycle comprising 30 seconds at 94° C., then 30 seconds at 58° C., then 30 seconds at 72° C.; and a last phase of returning to 72° C. for approximately 4 minutes. At the end of the PCR, the amplicons are preferably stored at −20° C. According to the invention, the amplicons correspond to the fusion transcripts or to the transcripts corresponding to an exon skipping present in the sample from the patient/subject to be tested, or possibly to a 5′-3′ imbalance.


According to the invention, in one particular embodiment, and when it is present, the index sequence is in particular introduced during the PCR step at the 3′ end of a primer sequence, in particular the “R” primer sequence.


According to the invention, in one particular embodiment, a first extension sequence can be introduced at 5′ of a primer sequence, and a second extension sequence can be introduced at 3′ of the index sequence.


According to the invention, in one particular embodiment, each pair of probes used in the PCR step comprises a different index sequence which makes it possible to identify the patients. PCR is typically carried out using commercial kits, such as the ready-to-use kits sold by Eurogentec (Red′y′Star Mix) or NEB (Q5 High fidelity DNA polymerase). Typically, the PCR takes place in a first phase of initial denaturation at a temperature between 90° C. and 100° C., typically around 94° C., for a period of 5 to 8 minutes; then a second amplification phase comprising several cycles, typically 35 cycles, each cycle comprising 30 seconds at 94° C., then 30 seconds at 58° C., then 30 seconds at 72° C.; and a last phase of returning to 72° C. for approximately 4 minutes. At the end of the PCR, the amplicons are preferably stored at −20° C.


In a preferred embodiment of the method according to the invention, the RT-MLPA step also comprises a step f) of analyzing the results of the PCR of step e), preferably by sequencing. According to the invention, the sequencing step is preferably a step of capillary sequencing or next-generation sequencing. For this purpose, it is possible to use a capillary sequencer (for example such as the AB13130 Genetic Analyzer, Thermo Fisher) or a next generation sequencer (for example the MiSeq System, Illumina, or the ion S5 System, Thermo Fisher). Several sequences are analyzed simultaneously, the index sequence thus making it possible to associate any identified genetic abnormality with a tested subject.


This analysis step allows immediately reading the result, and indicates directly whether the sample from the subject carries a specific translocation, identified or not, and/or exon skipping such as the skipping of exon 14 of the MET gene or the skipping of exons of the EGFR gene, or possibly a 5′-3′ imbalance.


In a preferred embodiment of the method according to the invention, the RT-MLPA step also comprises a step g) of determining the level of expression of the amplicons that are obtained at the end of the PCR step. Determining the level of expression of the amplicons allows ensuring in particular that the ligations obtained are indeed representative of a fusion transcript or of a transcript corresponding to exon skipping, and do not correspond to a ligation artifact. According to the invention, this step g) is implemented in particular by computer. This determining of the level of expression is implemented by the following steps: (1) demultiplexing the results obtained at the end of the PCR step (i.e. step e)) in order to isolate the sequences obtained for a given subject, thanks to the index sequences, (2) determining the number of DNA or RNA fragments present in the sample from the patient to be tested (before amplification) thanks to the molecular barcodes, and optionally (3) supplying an expression matrix for each fusion transcript or transcript corresponding to an exon skipping or to a 5′-3′ imbalance identified for the tested subject. This determining of the level of expression of the amplicons obtained at the end of a PCR step makes it possible to add more precision to the results of the PCR step, and in particular to the sequencing errors that may occur (see step f) indicated above). Ultimately, determining the level of expression of the amplicons obtained at the end of a PCR step makes it possible to add more precision to the diagnosis of cancer according to the invention.


According to an even more particular embodiment, step g) is a step of analyzing the amplicons obtained at the end of the PCR step, which is implemented by computer, in particular by an arrangement of bioinformatic algorithms. More particularly, this step g) comprises the following steps: (1) a step of demultiplexing based on the identification of the indexes, (2) a step of identifying the pairs of probes, (3) a step of counting the reads (results) and molecular barcode sequences (Barcodes: UMI sequence (Unique Molecular Index)), and optionally (4) a step of evaluating the quality of the sequencing of the sample. The sequences as analyzed by the software are shown in FIG. 7.


In a preferred embodiment of the method according to the invention, if, for a biological sample from a subject, a PCR amplification is obtained in step e) following hybridization with a pair of probes targeting fusion genes and/or exon skipping, then the subject is a carrier of the cancer linked to the genetic abnormality corresponding to the pair of probes identified. Preferably, this abnormality is typically analyzed in step f) and/or g) as mentioned above.


In a preferred embodiment of the method according to the invention, the PCR amplification of step e) is carried out using the pair of primers SEQ ID NO: 101 and 92 or SEQ ID NO: 102 and 94.


In a preferred embodiment of the method according to the invention, a cancer is thus identified and allows the patient (meaning the subject to whom the tested biological sample belongs) to benefit from a targeted therapy. According to the invention, “targeted therapy” means any anticancer therapy, such as chemotherapy, radiotherapy, or immunotherapy, but preferably means pharmacological inhibitors of the ALK, ROS, RET, EGFR, and MET proteins.


The invention also relates to a kit comprising at least the probes SEQ ID NO: 1 to 13, and/or the probes SEQ ID NO: 96 to 99, preferably further comprising the probes SEQ ID NO: 14 to 91, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence, in particular the probes SEQ ID NO: 14 to 91 and optionally SEQ ID NO: 96 and 98.


The invention also relates to a kit comprising at least the probes SEQ ID NO: 868 to 938 and/or the probes SEQ ID NO: 940 to 1104 and/or the probes SEQ ID NO: 1105 to 1107 and/or the probe SEQ ID NO: 939 and/or the probes SEQ ID NO: 1108 to 1123, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


The invention also relates to a kit comprising at least the probes SEQ ID NO: 1211 to 1312, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


The invention also relates to a kit comprising at least the probes SEQ ID NO: 1 to 13, and/or the probes SEQ ID NO: 96 to 99 and/or the probes SEQ ID NO: 866 to 938 and/or the probes SEQ ID NO: 940 to 1104 and/or the probes SEQ ID NO: 1105 to 1107 and/or the probe SEQ ID NO: 939 and/or the probes SEQ ID NO: 1108 to 1123, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


The invention also relates to a kit comprising at least the probes SEQ ID NO: 1 to 13, and/or the probes SEQ ID NO: 96 to 99 and/or the probes SEQ ID NO: 866 to 938 and/or the probes SEQ ID NO: 940 to 1104 and/or the probes SEQ ID NO: 1105 to 1107 and/or the probe SEQ ID NO: 939 and/or the probes SEQ ID NO: 1108 to 1123, and/or the probes SEQ ID NO: 1211 to 1312, optionally the probes SEQ ID NO: 1148, 1149, 1178, 1179, 1209 and/or 1210, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


The invention also relates to a kit comprising at least the following probes: SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 826 to 835, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824 and SEQ ID NO: 825, each of the probes being preferably fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


The invention also relates to a kit comprising at least the following probes: SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 826 to 835, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824, SEQ ID NO: 825, SEQ ID NO: 866 to 938, SEQ ID NO: 940 to 1104, SEQ ID NO: 1105 to 1107, SEQ ID NO: 939 and SEQ ID NO: 1108 to 1123, each of the probes being preferably fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


The invention also relates to a kit comprising at least the following probes: SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 826 to 835, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824, SEQ ID NO: 825, SEQ ID NO: 866 to 938, SEQ ID NO: 940 to 1104, SEQ ID NO: 1105 to 1107, SEQ ID NO: 939, SEQ ID NO: 1108 to 1123, and SEQ ID NO: 1211 to 1312, optionally the probes SEQ ID NO: 1148, 1149, 1178, 1179, 1209 and/or 1210, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes of said pair preferably comprising a molecular barcode sequence.


Determining the level of expression of the amplicons that are obtained at the end of a PCR step (for example carried out according to step e) above) is very advantageous because it allows ensuring that the obtained results are reliable. It allows in particular determining the number of RNA molecules (in particular the fusion transcripts or the transcripts corresponding to exon skipping or the transcripts of the genes whose 5′-3′ imbalance is to be analyzed) present in the sample to be tested. This adds more precision to the diagnosis performed.


In this aspect, the invention thus relates to a method for determining the level of expression of the amplicons that are obtained at the end of a PCR step, said method being implemented by computer and comprising the following steps:


(a) providing a sample to be tested, said sample comprising amplicons obtained at the end of a PCR step, and


(b) determining the level of expression of the amplicons.


In one particular embodiment of the method implemented by computer according to the invention, the determination of the level of expression of the amplicons aims in particular to:


(1) demultiplex the results of amplicons obtained at the end of a PCR step,


(2) determine the number of DNA or RNA fragments present in the sample of the patient to be tested (before amplification), and optionally


(3) provide an expression matrix for each fusion transcript or transcript corresponding to exon skipping identified for the patient being tested.


This determination of the level of expression of the amplicons that are obtained at the end of a PCR step allows adding more precision to the results. Analysis of the amplicons and their quantification can also be carried out very quickly.


In one particular embodiment, the method implemented by computer comprises the following steps:


(1) a step of demultiplexing the results of amplicons obtained at the end of a PCR step,


(2) a step of searching for pairs of probes used during the PCR step,


(3) a step of counting the reads (results, i.e. fusion transcripts or exon skippings) and molecular barcode sequences (UMI sequence (Unique Molecular Index)), optionally the index sequence, and optionally


(4) a step of evaluating the quality of sequencing of the sample.


The software according to the invention requires three files for its execution: a FASTQ, an index file and a marker file.


FASTQ: During a sequencing experiment, the raw data are generated in the form of a standard file called FASTQ. This FASTQ format will group, for each read sequenced by the device: (1) a unique sequence identifier, (2) the sequence of the read, (3) the read direction, (4) an ASCII sequence grouping the quality scores per base for each base that is read. An example of a read in FASTQ format is shown in FIG. 8. A FASTQ file is therefore composed of this repetition of 4 lines for each sequenced read. A high-throughput sequencing experiment generates hundreds of millions of sequences. The FASTQ file is the raw file required to launch the software according to the invention.


Marker file: This file groups all the sequences of each probe as well as their name. It brings together all the pairs of probes used during a diagnosis. It is specific to each kit (expression measurement, searching for fusion transcripts, for exon skipping, for imbalance, etc.).


Index file: This file groups the list of sequences used to identify the subjects tested. It gathers together all the index sequences used during a diagnosis. Each sequence will correspond to a tested subject and will allow reassigning the sequenced reads. This file is specific to each experiment.


According to the invention, the term “step of demultiplexing” means the step which aims to identify the various index sequences used during construction of the library to identify the reads for each of the subjects tested. This search is carried out by an exact and inexact matching algorithm for comparing sequences to allow taking into account the sequencing errors linked to the method of acquisition by high-throughput sequencing. According to the invention, a “library” is understood to mean the construction comprising at least an index sequence, a left probe and a right probe that are characteristic of a genetic abnormality, and optionally a molecular barcode sequence.


According to the invention, the term “step of searching for pairs of probes” means the step which aims to identify, for each sequence of the FASTQ file, whether there is a pair of probes in the marker file that allow attributing it to an entity that was to be measured (fusion transcripts, exon skipping . . . ). A data structure in the algorithm allows associating with each sequence a tag bearing the name of the two probes, left (“L”) and right (“R”). This search is carried out as an exact search by comparing sequences (e.g. the Hamming and Levenshtein distance calculation) and by an approximate method tolerating ‘k’ errors. This ‘k’ parameter can be changed when launching the tool. For the expression measurement, each pair of probes (right and left) is specific to an entity whose expression is to be measured. To measure the expression of a gene, two probes are used which hybridize strictly one behind the other to this gene. These probes will then be assembled during the ligation step, then amplified and read. Sequences having no logical tag during the search for probes are stored, in order to perform a search for chimeras. Indeed, it is possible that certain probes cross-hybridize during the hybridization, ligation, and amplification steps during construction of the library, leading to the appearance of hybrid sequences (for example a right probe of gene A with a left probe of gene B). Here again, these sequences are detected by exact and inexact matching of sequences. For the search for fusion transcripts, it is not known which probes will hybridize together and be amplified. The search for the probes is therefore carried out without preconceptions, by comparison of all pairs of possible right/left sequences.


According to the invention, the term “a step of counting the reads (results) and molecular barcode sequences” means the step occurring when the FASTQ file is scanned and the pairs of probes identified (markers and chimeras). The algorithm will proceed to count them. These counts are of two types: (1) quantifying the number of sequences read by the sequencer, and (2) the number of unique molecular barcode (UMI) sequences assigned to the marker. Sequence counting is done based on the data structure previously described during identification of the markers. The number of tags assigned for each marker will be determined by traversing the data structure. Counting the IMUs is more complex. It involves a step of extracting the UMI of each sequence and a step of correcting sequencing errors in the UMIs. The significant combinatorial analysis of these random sequences, their counts, and the amplification factor of the sample will make it possible to identify the IMUs carrying sequencing errors in order to correct the count data. This correction of the UMIs involves creating a graph structure associating a counter with each unique UMI. The UMIs are then grouped by increasing count with k tolerated errors. The UMIs allow identifying the number of unique sequences read by the sequencer before the amplification step during preparation of the library. They therefore provide information about the number of transcripts actually read and not the number of transcripts read after amplification.


According to the invention, the term “a step of evaluating the quality of sequencing of the sample” means the step which aims to determine the analyzed sequences which are not significant. A quality score indicative of the diversity of the libraries, meaning the number of unique transcripts read, has been implemented in the algorithm so as to provide an indication of the richness of the sample analyzed and to eliminate samples that would be considered as failures (i.e. having a score <5000).


Preferably, the method implemented by computer according to the invention makes it possible to calculate the level of expression of a large number of fusion transcripts or transcripts corresponding to exon skipping (in particular greater than 1000) for a large number of samples (in particular greater than 40), and to do so in a very short time (in particular 5 to 10 minutes).


According to one particular embodiment, the method implemented by computer can make it possible to correct sequencing errors which arise during sequencing of the amplicons, for example the correction of sequencing errors in molecular barcode sequences (UMI) (see for example ‘Method called Directional & Reference: Smith, T., Heger, A., & Sudbery, I. (2017). UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Research, 27(3), 491-499. http://doi.org/10.1101/gr.209601.116))


Tables 1 and 2 below provide details concerning the sequences of the invention.











TABLE 1








SEQ ID NO: 1
SEQ ID NO: 52



TGTCA
ATTG



CCCACCCCGGAGCCA
CTGTGGGAAATAATG



(R)
ATGTAAAG






SEQ ID NO: 2
SEQ ID NO: 53



AGCCC
GCAG



TGAGTACAAGCTGAG
CATGTCAGCTTCGTA



CAAGCTCCGC (R)
TCTCTCAA (L)






SEQ ID NO: 3
SEQ ID NO: 54



TGTAC
AAGA



CGCCGGAAGCACCAG
ACTAGTCCAGCTTCG



GAG (R)
AGCACAAG (L)






SEQ ID NO: 4
SEQ ID NO: 55



TGGAA
CAGG



GCAAGCAATTTCTTC
ACCTGGCTACAAGAG



AACC (R)
TTAAAAAG (L)






SEQ ID NO: 5
SEQ ID NO: 56



ATCTG
GAAC



GGCAGTGAATTAGTT
AGCTCACTAAAGTGC



CGCTACG (R)
ACAAACAG (L)






SEQ ID NO: 6
SEQ ID NO: 57



ATCAG
AGAA



TTTCCTAATTCATCT
GAGGGCATTCTGCAC



CAGAACGGTT (R)
AGATTG (L)






SEQ ID NO: 7
SEQ ID NO: 58



ATCCA
GAAA



CTGTGCGACGAGCTG
GGGAGTTTGGTTCTG



TGC (R)
TAGATG (L)






SEQ ID NO: 8
SEQ ID NO: 59



GAGGA
GTTG



TCCAAAGTGGGAATT
CTCCTATTGCAACAA



CCCT (R)
CAAACTCAG (L)






SEQ ID NO: 9
SEQ ID NO: 60



ATGTG
GGAT



GCCGAGGAGGCGGGC
CTTCGTAGCATCAGT



(R)
TGAAGCAG (L)






SEQ ID NO: 10
SEQ ID NO: 61



CTGG
TTTT



AGTCCCAAATAAACC
CTTACCACAACATGA



AGGCAT (R)
CAGTAGTG (L)






SEQ ID NO: 11
SEQ ID NO: 62



ATGA
AGGC



TTTTTGGATACCAGA
TGTGGAGTGGCAGCA



AACAAGTTTCA (R)
GAAG (L)






SEQ ID NO: 12
SEQ ID NO: 63



TCTG
GAGG



GCATAGAAGATTAAA
AACAGACTAAGAAGG



GAATCAAAAAA (R)
CTCAGCAAG (L)






SEQ ID NO: 13
SEQ ID NO: 64



TACT
GCTG



CTTCCAACCCAAGAG
TATCTCCATGCCAGA



GAGATTGAA (R)
GCAG (L)






SEQ ID NO: 14
SEQ ID NO: 65



CAAC
AAAG



ATTCAACTCCCTACT
CAGACCTTGGAGAAC



TTGTCCATCAG (L)
AGTCAG (L)






SEQ ID NO: 15
SEQ ID NO: 66



AGCC
CAGT



CAAGCTTCCCATCAC
GOATATTAGTGGACA



AG (L)
GOACTTAGTAG (L)






SEQ ID NO: 16
SEQ ID NO: 67



ACAG
GGTG



GCTGTGTGCATGCAC
GTACTGGCCCAAGGT



CAAAG (L)
AAAAAAG (L)






SEQ ID NO: 17
SEQ ID NO: 68



GAAG
CAGT



ATTGCCCGAGAGCAA
ATGAAAAAAAGCTTA



AAAG (L)
AATCAACCAAA (L)






SEQ ID NO: 18
SEQ ID NO: 69



GCAA
ACAT



AGCCAGCGTGACCAT
TTCATGGGGCTCCAC



C (L)
TAACAG (L)






SEQ ID NO: 19
SEQ ID NO: 70



TGAG
GTGG



CTCTCCAGAAAATTG
GAACGTGAAACATCT



ATGCAG (L)
GATACAAG (L)






SEQ ID NO: 20
SEQ ID NO: 71



CGAG
AGCT



TTCAAGCAGGCCTAT
GTCTGGCTCTGGAGA



ATCACCTG (L)
TCTGG (L)






SEQ ID NO: 21
SEQ ID NO: 72



TGGG
TGAG



AACATCCCATGGTAT
AGAACGGAGGTCCTG



CACA (L)
GCAG (L)






SEQ ID NO: 22
SEQ ID NO: 73



GCCA
GTAC



CCCATGCAGCCCACG
CACCTTATCCACAGC



(L)
CACAGC (L)






SEQ ID NO: 23
SEQ ID NO: 74



GCCC
GCTG



ACTGACGCTCCACCG
CCTGCGTCCCAAAGA



AAAG (L)
ACAG (L)






SEQ ID NO: 24
SEQ ID NO: 75



CCAA
ACAT



GCAGGATCTGGGCCC
AACCATTAGCAGAGA



AG (L)
GGCTCAGG (L)






SEQ ID NO: 25
SEQ ID NO: 76



GGCA
CGCC



GCTCAGCAGCTCCTC
TTCCAGCTGGTTGGA



AG (L)
G (L)






SEQ ID NO: 26
SEQ ID NO: 77



TGGC
GCAG



CAATGTGATCTGGAA
CTGCCCTTAGCCCTC



CTTATTAAT (L)
TGG (L)






SEQ ID NO: 27
SEQ ID NO: 78



ATCC
TGTT



AGGTCATGAAGGAGT
ACCTCAAGAAGCAGA



ACTTGACAAAG (L)
AGAAGAAAACA (L)






SEQ ID NO: 28
SEQ ID NO: 79



CTAC
GAAG



AGAGACACAACCCAT
CCTCCAAGCTATGAT



TGTTTATG (L)
TCTG (L)






SEQ ID NO: 29
SEQ ID NO: 80



CTAC
GACC



TCTGGTCTCTGGCAT
TTCCACCAATATTCC



TGCTGGTG (L)
TGAAAATG (L)






SEQ ID NO: 30
SEQ ID NO: 81



CTTC
TTGG



ATGAGCTGCAATCTC
CTTAACAGATGATCA



ATCACTG (L)
GGTTTCAG (L)






SEQ ID NO: 31
SEQ ID NO: 82



CCCACACCTGGGAAA
CTCAGACTCAAGCAG



GGACCTAAAG (L)
GTCAGATTGAAG (L)






SEQ ID NO: 32
SEQ ID NO: 83



GATCTGAATCCTGAA
AGCCTCAACAGTATG



AGAGAAATAGAG (L)
GTATTCAGTATTCAG




(L)






SEQ ID NO: 33
SEQ ID NO: 84



TGAAAGAGAAATAGA
TCAGGGAACAGGAAG



GATATGCTGGATG (L)
AATTCCTAGGG (L)






SEQ ID NO: 34
SEQ ID NO: 85



TTTAATGATGGCTTC
TGGAAAAGACAATTG



CAAATAGAAGTACAG
ATGACCTGGAAG (L)



(L)







SEQ ID NO: 35
SEQ ID NO: 86



GCCATAGGAACGCAC
AAACAACAGGAGTTG



TCAGGCAG (L)
CCATTCCATTACATG




(L)






SEQ ID NO: 36
SEQ ID NO: 87



AGCTCTCTGTGATGC
CCGTCAGCCTCTTCT



GCTACTCAATAG (L)
CCCCAG (L)






SEQ ID NO: 37
SEQ ID NO: 88



ACTCGGGAGACTATG
GCTGCCAGATATTCC



AAATATTGTACT (L)
ACCCATACAG (L)






SEQ ID NO: 38
SEQ ID NO: 89



CAGTGAAAAAATCAG
ACAGAGGATGGCAGG



TCTCAAGTAAAG (L)
AGGAGTGCTTGCATG




(L)






SEQ ID NO: 39
SEQ ID NO: 90



AGCATAAAGATGTCA
GTTAAGCCCCGTGGA



TCATCAACCAAG (L)
CCAAAGG (L)






SEQ ID NO: 40
SEQ ID NO: 91



AGCGGAAGGTTAATG
GCTGGAAACATTTCC



TTCTTCAGAAGAAG (L)
GACCCTG (L)






SEQ ID NO: 41
SEQ ID NO: 92



GGAGAAGACAAAGAA
GTGCCAGCAAGATCC



GGCAGAGAGAG (L)
AATCTAGA (L)






SEQ ID NO: 42
SEQ ID NO: 93



ATCAGATAAAGAGCC
TCCAACCCTTAGGGA



AGGAGCAGCTG (L)
ACCC (R)






SEQ ID NO: 43
SEQ ID NO: 94



CAAAGCCACTGGAGT
GCCATTGCGGTGACA



CTTTACCACAC (L)
CTATAG (L)






SEQ ID NO: 44
SEQ ID NO: 95



AGAAACAAGAAACCC
CCCTATAGTGAGTCG



TACAAGAAGAAATAA
TCGTCGC (R)



(L)







SEQ ID NO: 45
SEQ ID NO: 96



AGCTTAAGAATGAAC
CTGTGGCTGAAAAAG



CGACCACAAGAA (L)
AGAAAGCAAATTAAA




G (L)






SEQ ID NO: 46
SEQ ID NO: 97



CAAGTACTTGGATAA
ATCTGGGCAGTGAAT



GGAACTGGCAGGAAG
TAGTTCGCTACG (R)



(L)







SEQ ID NO: 47
SEQ ID NO: 98



ACACAAGTGGGGAAA
GAATCTGTAGACTAC



TCAAAGTATTACAAG
CGAGCTACTTTTCCA



(L)
GAAG (L)






SEQ ID NO: 48
SEQ ID NO: 99



CCCACCTGAGCCTGC
ATCAGTTTCCTAATT



CGACT (L)
CATCTCAGAACGGTT




C (R)






SEQ ID NO: 49
SEQ ID NO: 100



GCAAATCACAGATCG
NNNNNNNNNN



AAGAGACAG (L)







SEQ ID NO: 50
SEQ ID NO: 101



TGCTGAGGGCTGGGA
GGGTTCCCTAAGGGT



AGAAG (L)
TGGA (L)






SEQ ID NO: 51
SEQ ID NO: 102



TTAGTTAATCACGAT
GCGACGACGACTCAC



TTCTCTCCTCTTGAG
TATAGGG (L)



(L)







SEQ ID NO: 866
SEQ ID NO: 1001



CCGTCCACACCCGCC
GGTCACAGCCCCCAT



GCCAG (L)
TCCAG (L)






SEQ ID NO: 867
SEQ ID NO: 1002



ACCGCGAGAAGATGA
TGATGTCCTTGCATT



CCCAG (L)
GCCCATTTTTA (R)






SEQ ID NO: 868
SEQ ID NO: 1003



CTAAGCAGTGATGAA
GGGGCTCCAGGACCC



GAGGAGAATGAACAG
CTGCC (R)



(L)







SEQ ID NO: 869
SEQ ID NO: 1004



CGCTCGCCCGGACCC
AGACCGAGGCAAAGG



CTCAG (L)
CCCTTTT (R)






SEQ ID NO: 870
SEQ ID NO: 1005



GAAGAAGAGCTGAGA
CAGGAACAAAGGCTG



AAAGCCATTTTAGTG
CTCCAGCT (L)



(R)







SEQ ID NO: 871
SEQ ID NO: 1006



GAAGTGGTCCTGTAC
ATGACCTTCTTTCTG



TGCTTAGAGAACAAG
CCACAAAACGTAAAG



(R)
(L)






SEQ ID NO: 872
SEQ ID NO: 1007



GCGAGTATAGTGTTG
GCGAAGCTGGAGAAG



GAAACAAGCACC (R)
TCACTGGAG (R)






SEQ ID NO: 873
SEQ ID NO: 1008



TGCCGGAAGCTGCCC
CCACCAGGGAGCTCC



AGTGA (R)
TGCAG (L)






SEQ ID NO: 874
SEQ ID NO: 1009



GTTTACAGAAAAAGC
GAAACTGGGCATCTC



AAAGGAAACCGTTCT
TGTGGCC (R)



(L)







SEQ ID NO: 875
SEQ ID NO: 1010



CTGACAGCGAAGACT
GATGGACATGGTAGA



CCGAAACAG (L)
GAATGCAGATAGTTT




(R)






SEQ ID NO: 876
SEQ ID NO: 1011



GCAGCCCTGCTTCTT
GAGCTCTGGGCCCTG



CACAGTT (L)
GCGAG (L)






SEQ ID NO: 877
SEQ ID NO: 1012



TCCATGGCATCAAGT
GGGCCTCAGCGTGGA



GGACC (R)
CTCAG (L)






SEQ ID NO: 878
SEQ ID NO: 1013



GAGCTGGCGGCAGCG
CACTGGCCAGAGGTA



TGCAT (R)
CTTCCTCAA (L)






SEQ ID NO: 879
SEQ ID NO: 1014



GTGAAGCGGCCCAGG
GCAGTATCCCAGCCA



TGAGG (L)
AATCTCG (L)






SEQ ID NO: 880
SEQ ID NO: 1015



TCCACCCTCAAGGGC
CCAAATCCCACTCCC



CCCAG (L)
GACAG (L)






SEQ ID NO: 881
SEQ ID NO: 1016



CAGCAAGTATCCAAT
GACTTCAGACATGCA



GGGTGAAGAAG (L)
GGGTGACG (L)






SEQ ID NO: 882
SEQ ID NO: 1017



GTAAGACTCGGACCA
ATGAAAAAAAAGATA



AGGACAAGTACCG (R)
TTGACCATGAGACAG




(R)






SEQ ID NO: 883
SEQ ID NO: 1018



GCAAACAGCAGCCCA
GGACAAACCTGACTC



GCAGA (L)
CTTCATGG (L)






SEQ ID NO: 884
SEQ ID NO: 1019



GTCGAGGGCCAAGAC
CAGCTCTGCTACCCC



GAAGACA (L)
AAGACAG (L)






SEQ ID NO: 885
SEQ ID NO: 838



CAGTAACCTTATGCC
NNNNNNNNNN



TAGCAACATGCCAAT




(L)







SEQ ID NO: 886
SEQ ID NO: 1020



ATCCCACTATTATTT
CATGGATCTGACTGC



TGGCACAACAGGAAG
CATCTACGAG (L)



(L)







SEQ ID NO: 887
SEQ ID NO: 1021



AGAACCATTGGCTCT
CAGGCACCGCCCCTG



CACTGAAACAG (L)
GGGCT (R)






SEQ ID NO: 888
SEQ ID NO: 1022



AATGTGAAAAGGTTT
CCACTCGGGCGAGAA



GCGCTCCTG (L)
GCCGC (R)






SEQ ID NO: 889
SEQ ID NO: 1023



AGGACCTGGTGCAGA
CGGGTGGACATTCCC



TGCCT (R)
CTCAG (L)






SEQ ID NO: 890
SEQ ID NO: 1024



AAATTACAGGGGACA
GTGGGCCTCCTGGGC



TCAGGGCCACT (R)
CTCAG (L)






SEQ ID NO: 891
SEQ ID NO: 1025



CCCCAGTGGACCACC
TCCCTGGAATGAAGG



TGCAT (R)
GACACAGA (L)






SEQ ID NO: 892
SEQ ID NO: 1026



AAACTGCAGGGATCA
ATGGCAAAACTGGCC



GGCCC (R)
CCCCT (L)






SEQ ID NO: 893
SEQ ID NO: 1027



GGCACTGCACTGTGT
TCCCTGGACCTAAAG



GCGAG (L)
GTGCTGCT (L)






SEQ ID NO: 894
SEQ ID NO: 1028



TTGCTATAGCCCAAG
AAGCAGGCAAACCTG



GTGGAACAATC (R)
GTGAACAG (L)






SEQ ID NO: 895
SEQ ID NO: 1029



CTGCCACTGGTGACA
TCCAGGGCCTAAGGG



TGCCAAC (R)
TGACAGA (L)






SEQ ID NO: 896
SEQ ID NO: 1030



GCCTGACGCGGGCCG
CTGGTGCCCCTGGTG



CGCGG (L)
ACAAG (L)






SEQ ID NO: 897
SEQ ID NO: 1031



CCGACCTCACCCTGT
CTGGACCCCCTGGCC



CGCGG (L)
CCATT (L)






SEQ ID NO: 898
SEQ ID NO: 1032



GAGGAGCCTGTTCCC
AGGGTCCCCCTGGCC



CTGAG (L)
CTCCT (L)






SEQ ID NO: 899
SEQ ID NO: 1033



TGATGGCTTGTGCCC
CTGGTCCTGCTGGTC



AAACAG (L)
CCCGA (L)






SEQ ID NO: 900
SEQ ID NO: 1034



AGACAGCAGTGAGCA
CTGGCGAGCCTGGAG



TGGCG (L)
CTTCA (L)






SEQ ID NO: 901
SEQ ID NO: 1035



ATCAAGATGACTGTG
ATGTCACCGGGTGCG



CTCCTGTGGGA (R)
CATCAAT (R)






SEQ ID NO: 902
SEQ ID NO: 1036



ATATTGATGAGTGCC
CTACAAGAGACTGTG



AACTGGGGGAG (R)
AAAAGGAAGTTGGAA




(R)






SEQ ID NO: 903
SEQ ID NO: 1037



GGTCAAATTTCAGCC
CATCCCAGTGACTGC



ATCAGCAA (L)
ATCCCTC (R)






SEQ ID NO: 904
SEQ ID NO: 1038



AGGACTGGGCGCTGC
GGGGACCCCATTCCC



TGCAG (L)
GAGGA (R)






SEQ ID NO: 905
SEQ ID NO: 1039



GTAAAAGTAGCAGTG
GTTTCAAAGTCACCC



GTTCAGOACACTTTG
TCCCACCTTT (R)



(L)







SEQ ID NO: 906
SEQ ID NO: 1040



TCAGACGAAGAACCT
GTCCCGTGGCTGTCA



CTCTCCCAG (L)
TCAGTG (R)






SEQ ID NO: 907
SEQ ID NO: 1041



CAGTGCCATCAGCAG
CCCTGGCGAGCCCCT



CATAGCAAG (L)
TGCAG (L)






SEQ ID NO: 908
SEQ ID NO: 1042



GCTCGACTGTGGGGA
ACACTAACAGCACAT



AACCATAAG (L)
CTGGAGACCCG (R)






SEQ ID NO: 909
SEQ ID NO: 1043



GCCACCACCACTCCG
GTCTCGGTGGCTGTG



TGGAG (L)
GGCCT (R)






SEQ ID NO: 910
SEQ ID NO: 1044



CCAGCAGCCACTGCA
TGTCCTCCTTGAAGG



CCTACAAG (L)
GCTCCAG (L)






SEQ ID NO: 911
SEQ ID NO: 1045



TATGGACAGAGTAAC
CCTCCACTGAAGAAG



TACAGTTATCCCCAG
CTGAAACAAGAG (L)



(L)







SEQ ID NO: 912
SEQ ID NO: 1046



CCCTGACCGAGAAGT
GAGAGTCTGGATGGA



TTAATCTGCCT (R)
CATTTGCAGG (L)






SEQ ID NO: 913
SEQ ID NO: 1047



TCTTGAAAGCGCCAC
TGCGAAGCCACCTCT



AAGCA (R)
CGCAG (L)






SEQ ID NO: 914
SEQ ID NO: 1048



ATGCTCTCCCCTCCT
GCTCTCCACAGATAG



CGGAGGA (R)
AGAACATCCAGC (R)






SEQ ID NO: 915
SEQ ID NO: 1049



GGAGAGGAGCACCAC
CTGAACAGATGGGTA



CCCAG (L)
AGGATGGCAG (R)






SEQ ID NO: 916
SEQ ID NO: 1050



GTGTCCCTATCTCTG
GGACCAACCACTTCC



ATACCATCATCCCAG
TACCCCAG (R)



(L)







SEQ ID NO: 917
SEQ ID NO: 1051



CTCCTTCAGACAATG
GCCCCAGGTGTACCC



CAGTGGTCTTAACAA
ACCAC (R)



(L)







SEQ ID NO: 918
SEQ ID NO: 1052



GCACACCTCTTAGAG
GCCTCACCTGCAGAT



GAAGACAGAAAACAG
GCCCC (R)



(L)







SEQ ID NO: 919
SEQ ID NO: 1053



GAAGTGGTCATTTCA
GCAACCTCCAAGTCC



GATGTGATTCATCTA
CAGATCATGT (R)



(L)







SEQ ID NO: 920
SEQ ID NO: 1054



CTCCTCACCCTCTGC
GGAGTTCCTGGTCGG



CGAGTCTCAAT (R)
CTCCG (R)






SEQ ID NO: 921
SEQ ID NO: 1055



GAGTGCGCCGGTCTC
CTTACCGTGACGTCC



GGGGA (R)
ACCGAC (L)






SEQ ID NO: 922
SEQ ID NO: 1056



TGGTGGCTATGAACC
GAGAGAGCCTTGAAC



CAGAGGT (L)
TCTGCCAGC (R)






SEQ ID NO: 923
SEQ ID NO: 1057



AGTCTGTGGCTGATT
TTTAAGGAGTCGGCC



ACTTCAAGCAGATTG
TTGAGGAAGC (R)



(L)







SEQ ID NO: 924
SEQ ID NO: 1058



CCCATCTCTGGGATT
GTGCCAGGCCCACCC



CCCAG (R)
CCAGG (R)






SEQ ID NO: 925
SEQ ID NO: 1059



CTGAAGTCTGAGCTG
GTAAAGGCGACACAG



GACATGCTG (R)
GAGGAGAACC (R)






SEQ ID NO: 926
SEQ ID NO: 1060



GATCCCCTGTTGGGG
CCTCTGTGTTTGCCG



ATGCT (R)
CCTGG (L)






SEQ ID NO: 927
SEQ ID NO: 1061



CTGAAGGATGCTGTA
TGTTGAAGAGATTGG



CCACAGACG (L)
CTGGTCCTATACAG (L)






SEQ ID NO: 928
SEQ ID NO: 1062



GGACGACTTTATGAC
ACACATTCATTCATA



CAAGAGCTGAACAAG
ACACTGGGAAAACAG



(L)
(L)






SEQ ID NO: 929
SEQ ID NO: 1063



CTGCATACGGCAGGA
ATAAACCTCTCATAA



GGGAAAG (L)
TGAAGGCCCCCG (R)






SEQ ID NO: 930
SEQ ID NO: 1064



GAACCAACCGGTGAG
CCTGCAGCCCCCATA



CCCTC (R)
GCAG (L)






SEQ ID NO: 931
SEQ ID NO: 1065



TGAACCCCACCAACA
CTCGCAACGCCCTGG



CAGTTTTTG (L)
TGGTC (R)






SEQ ID NO: 932
SEQ ID NO: 1066



GGCCAACGGGTCTAA
GTGGCCTTGACCTCC



AGCAG (L)
AACCAG (L)






SEQ ID NO: 933
SEQ ID NO: 1067



AACCTATGTTGCCCT
GGGCTGCTGGAGTCC



GAGTTACATAAATAG
TCTGC (R)



(L)







SEQ ID NO: 934
SEQ ID NO: 1068



CCGCAGCAGCACTCC
GCATAGAGAAGGAGA



GACAG (L)
CGTGCCAGAAG (R)






SEQ ID NO: 935
SEQ ID NO: 1069



GGGAGGTTCAAGATT
CGGGTCCTGAACGCT



CTTATGAAGCTTATG
GTGAAAT (L)



(L)







SEQ ID NO: 936
SEQ ID NO: 1070



GCAGAAGTTAGCGCT
ATTATGGAACTGCAG



TCTCTCTCG (L)
CGAATGACATC (R)






SEQ ID NO: 937
SEQ ID NO: 1071



GCCGTGGTGGCTGGT
GCCCAGAGATCGCAG



TCCCT (R)
CATATCAAA (L)






SEQ ID NO: 938
SEQ ID NO: 1072



CGACTCATTCATCGC
GATGAGATTCTTCCA



CCTCCAG (L)
AGGAAAGACTATGAG




(L)






SEQ ID NO: 940
SEQ ID NO: 1073



TGCGGGGCCAGGTGG
GGTCAAGCTGCTGCT



CCAAG (L)
GCTCG (L)






SEQ ID NO: 941
SEQ ID NO: 1074



CTGGACTTCCAGAAG
GGGGACCTAATTACA



AACATCTACAGTGAG
CCTCCGGTTATG (L)



(L)







SEQ ID NO: 942
SEQ ID NO: 1075



GAGAATCTTTTAGGA
CAGCCTACATCGGAT



CAAGCACTGACGAAG
GCCCA (L)



(L)







SEQ ID NO: 943
SEQ ID NO: 1076



CTCCAGGGTTCCTTG
CGGCCAACAATCCCT



AAAAGAAAACAGG (R)
GCAGT (L)






SEQ ID NO: 944
SEQ ID NO: 1077



TAAAAAGCGAAAGAA
CGACGGGTCCATTGC



TAAAAACCGGCACAG
CAAG (L)



(L)







SEQ ID NO: 945
SEQ ID NO: 1078



GGGGACAACAGCAGT
GCCTGTCGGGGGTAC



GAGCAAG (L)
CACAG (L)






SEQ ID NO: 946
SEQ ID NO: 1079



GCCACTCAATGACAA
GACTTGATTAGAGAC



AAATAGTAACAGTGG
CAAGGATTTCGTGG (R)



(R)







SEQ ID NO: 947
SEQ ID NO: 1080



TCCACGGACGACTCA
GATCAACCACAGGTT



GAGCAAG (L)
TGTCTGCTACC (R)






SEQ ID NO: 948
SEQ ID NO: 1081



AATGAAGTTAGAAGA
AAAACACTTGGTAGA



AAGCGAATTCCATCA
CGGGACTCGAGT (R)



(L)







SEQ ID NO: 949
SEQ ID NO: 1082



CGGGGCAGATCCAGG
AGCTAAAAGGACAGC



TTCAG (L)
AGGTGCTACCA (L)






SEQ ID NO: 950
SEQ ID NO: 1083



TTTACAGCTGACCTT
TTTGCAGAAACACTC



GACCAGTTTGATCAG
CAATTTATAGATTCT



(R)
(L)






SEQ ID NO: 951
SEQ ID NO: 1084



GATTACCTGAGCTGG
GCCTACCCTTCTCTC



AATTGGAAGCAAT (R)
CCTCGCAG<L)






SEQ ID NO: 952
SEQ ID NO: 1085



CCTGGCAGTGAGCTG
GAAATTAAATACGGT



GACAACT (R)
CCCCTGAAGATGCTA




(L)






SEQ ID NO: 953
SEQ ID NO: 1086



CTTTTAATAACCCAC
ACCACCCTTACTGAA



GACCAGGGCAACT (R)
GAAAATCAAACAAGA




G (L)






SEQ ID NO: 954
SEQ ID NO: 1087



GAATGATTGGTAACA
CGCCTGTGGCAGATG



GTGCTTCTCGG (R)
CACCG (L)






SEQ ID NO: 955
SEQ ID NO: 1088



CATCCTGCCTATAGA
GAGGAGCAAAATAGA



CCAGGCGTCTTTT (R)
GGCAAGCCC (R)






SEQ ID NO: 956
SEQ ID NO: 1089



GGCCATCTGAATTAG
GCAGAAGGAGAAGAC



AGATGAACATGGG (R)
AGCCTGAAGA (R)






SEQ ID NO: 957
SEQ ID NO: 1090



CCCGACCCTGCCCGC
CCCGCCCAAGGGCCC



CCTGG (R)
AG (L)






SEQ ID NO: 939
SEQ ID NO: 1091



GTAATTATGTGGTGA
GCTCACCCAGTCCCC



CAGATCACGGCTCG (R)
ACCAG (L)






SEQ ID NO: 958
SEQ ID NO: 1092



CTGAGGATTTGTGAC
AACTGTTCCCCCTCA



TGGACCATGAATC (R)
TCTTCCCG (R)






SEQ ID NO: 959
SEQ ID NO: 1093



TCCTGGTACCTGGGC
AAGAGGATGGATTCG



TAGCTTGGT (R)
ACTTAGACTTGACCT




(L)






SEQ ID NO: 960
SEQ ID NO: 1094



GTGGGAGGCCGCACC
CTTCTTTTTCAGAAG



ATGCT (R)
ACACCCTAAAAAAAG




(R)






SEQ ID NO: 961
SEQ ID NO: 1095



AGAGCACGGATAACT
CTGATTCCAGAGAGC



TTATCTTGT (R)
TAAAGCCGATG (L)






SEQ ID NO: 962
SEQ ID NO: 1096



TTGACGAAGTGAGTC
AAAGCCAAACTTGGC



CCACACCTCCT (R)
CCTGCT (R)






SEQ ID NO: 963
SEQ ID NO: 1097



ATGAACAGCAAAGAT
CACCTGCAAGATGGG



GTTCAGTATTGTGCT
GCTGG (L)



(R)







SEQ ID NO: 964
SEQ ID NO: 1098



CATCTGCATTGCCGG
ATCTCCTGTGTGCCC



GACCG (R)
AGAAGACCT (L)






SEQ ID NO: 965
SEQ ID NO: 1099



GTTCATGGAGTTTGA
GTGCAAACCCAAATT



GGCTGAGGAGA (R)
ATCCTGATGTAATTT




(R)






SEQ ID NO: 966
SEQ ID NO: 1100



TGTACATTCCGAAGA
GTCTATGCTGTGGTG



AGGCAGCCT (R)
GTGATTGCGTC (R)






SEQ ID NO: 967
SEQ ID NO: 1101



CATACCCAGCGCTGG
ATTTCTCATGGTTTG



GACCG (R)
GATTTGGGAAAGTA (R)






SEQ ID NO: 968
SEQ ID NO: 1102



GAATCTTTCTGAACC
GCCCAGCCTCCGTTA



TGTCATGACCTATAG
TCAGC (R)



(R)







SEQ ID NO: 969
SEQ ID NO: 1103



GGCGGCGGTGCAGCG
AAATTAAATACGGTC



CTCCG (L)
CCCTGAAGATGCTA (L)






SEQ ID NO: 970
SEQ ID NO: 1104



GCCTGATCACTTGAA
GCAGAAGGAGAAGAC



CGGACATATCAAG (R)
AGCCTGAAGA (R)






SEQ ID NO: 971
SEQ ID NO: 1105



ACCTGCAATGCTTCT
GTCGGGCTCTGGAGG



TTTGCCACC (R)
AAAAGAAAG (L)






SEQ ID NO: 972
SEQ ID NO: 1106



TCTTACCAGCCCACA
TTTGCCAAGGCACGA



TCTATTCCACAAG (L)
GTAACAAG (R)






SEQ ID NO: 973
SEQ ID NO: 1107



GCGGAAGAGACGGAA
CCTGCGTGAAGAAGT



TTTCAACAA (R)
GTCCCC (L)






SEQ ID NO: 974
SEQ ID NO: 1108



ACGGAAAAGGCGTAA
ACCGATCAAGAGCTC



CTTCAGTAAACAG (R)
TCCATGTGAG (L)






SEQ ID NO: 975
SEQ ID NO: 1109



TTGACCTGGATAGGC
CTCCGAATGTCCTGG



TCAATGATGAT (R)
CTCATTCG (R)






SEQ ID NO: 976
SEQ ID NO: 1110



CAGCCCCATCCGGAT
GCCAGCCACCGACAC



GTTTG (R)
CTACAG (L)






SEQ ID NO: 977
SEQ ID NO: 1111



GCCCCCCCAGGATGC
CATCTCGGGCTACGG



AATGG (R)
AGCTGC (R)






SEQ ID NO: 978
SEQ ID NO: 1112



GTTGCCTCTTGGTGC
GGCAATTCCGGAGCC



TGCCT (R)
GCAG (L)






SEQ ID NO: 979
SEQ ID NO: 1113



ATTGGCCAAAATGGG
GTGGTGGAGGTGGCT



AAGGATTGG (R)
GGAATG (R)






SEQ ID NO: 980
SEQ ID NO: 1114



TCCCAGGACATCAAA
GCATCCTGTACACCC



GCTCTGCAG (R)
CAGCTTTAAAAG (L)






SEQ ID NO: 981
SEQ ID NO: 1115



GTGAAAAAACACGTG
TGATGGAAGGCCACG



CGCAGCTTC (R)
GGGAA (R)






SEQ ID NO: 982
SEQ ID NO: 1116



GAGATATCTCTGTGA
CCCCTGCAAGTGGCT



GTATTTCAGTATCAA
GTGAAG (L)



(R)







SEQ ID NO: 983
SEQ ID NO: 1117



GACATGAGCACAGTA
ACGCTGCCTGAAGTG



TATCAGATTTTTCCT
TGCTCTG (R)



(R)







SEQ ID NO: 984
SEQ ID NO: 1118



GTGCCCCAAAGATGC
CCTCATGGAAGCCCT



AAACG (L)
GATCATCAG (L)






SEQ ID NO: 985
SEQ ID NO: 1119



AAGTATTTGGCTGAG
CAAATTCAACCACCA



GAGTTTTCAATCCCA
GAACATTGTTCG (R)



(L)







SEQ ID NO: 986
SEQ ID NO: 1120



AAGCACAAGACCAAG
GGGATGGCCCGAGAC



ACAGCTCAACAG (L)
ATCTACAG (L)






SEQ ID NO: 987
SEQ ID NO: 1121



CTCAGTTCATTGCCA
GGCGAGCTACTATAG



GAGAGCCAT (L)
AAAGGGAGGCTG (R)






SEQ ID NO: 988
SEQ ID NO: 1122



CACCCCAGCCCTATC
CAAGAACTGCCCTGG



CCTTTACGT (R)
GCCTGT (L)






SEQ ID NO: 989
SEQ ID NO: 1123



CATGGAGACCCATTC
ATACCGGATAATGAC



AGATAACCCACTAAG
TCAGTGCTGGC (R)



(L)







SEQ ID NO: 990
SEQ ID NO: 996



ACCATGTCAGCAAAA
GTTTCAGCAGTTCAG



CTTCTTTTGGG (L)
CTCCACCAG (L)






SEQ ID NO: 991
SEQ ID NO: 997



GTTCTCCAAACCTAT
ATGTTGGATGACAAT



CCCCGAATCCG (R)
AACCATCTTATTCAG




(R)






SEQ ID NO: 922
SEQ ID NO: 998



ACCTGCAGCCAGTTA
GTATCAGCAGATGTT



CCTACTGCGAG (L)
GCACACAAACTTG (R)






SEQ ID NO: 993
SEQ ID NO: 999



ATGTAAAATGGGGTA
GCGGCCCTACGGCTA



AACTGAGAGATTATC
TGAACAG (L)



(L)







SEQ ID NO: 994
SEQ ID NO: 1000



AGGTACCAATCTTGG
AGCCAACACAGATCT



GAAAAAGAAGCAACA
ATAGATTTCTTCGAA



(L)
(R)






SEQ ID NO: 995
SEQ ID NO: 865



GACCTCCTCCAGCGG
NNNNNNNNNNNNNNN



GACAG (L)
NNNNN






SEQ ID NO: 1209 (R)
SEQ ID NO: 1210 (L)



TCTGGCATAGAAGAT
TGGAAAAGACAATTG



TAAAGAATCAAAAAA
ATGACCTGGAAG






SEQ ID NO: 1211 (R)
SEQ ID NO: 1212 (L)



GATAGCTAGCGGCCA
TGACTTCTGGATTCT



GGAGAAATACAGT
CCTCTTGAGTAAAAG






SEQ ID NO: 1213 (L)
SEQ ID NO: 1214 (R)



CGAACATGGCACGAA
TTTGGACATCACATT



AGAGATCAAG
TCACAGTCAGAAGG






SEQ ID NO: 1215 (R)
SEQ ID NO: 1216 (R)



ACCAAGCCACCCTGG
ACAGGTGATTTGGCT



TAGAACAAGTAA
TCTGCACAGTTAG






SEQ ID NO: 1217 (R)
SEQ ID NO: 1218 (L)



ATGGTGCTCCAAGAG
CCTTATTGGAGATTT



GCAGCTT
TACATTGTGCTATAG






SEQ ID NO: 1219 (L)
SEQ ID NO: 1220 (L)



CTGGCTGGAAAAAGA
TGGGAGAAGCAGCAG



GGAAAGATTTCTG
CGCAAG






SEQ ID NO: 1221 (L)
SEQ ID NO: 1222 (R)



GCCAAGAGGCAGACC
CTCCAGAAACATGAC



TAGGAAATGG
AAGGAGGACTTTC






SEQ ID NO: 1223 (L)
SEQ ID NO: 1224 (R)



TGGCGAAGCGGAGGC
CTGTCTGCGAGCCTG



CGGAG
GCTGTG






SEQ ID NO: 1225 (L)
SEQ ID NO: 1226 (L)



CAAGTTGTTCAGAAG
AGATGGTGCAGAAGA



AAGCCTGCTCAG
AGAACGCG






SEQ ID NO: 1227 (R)
SEQ ID NO: 1228 (L)



GGTACGAAGCCAGCC
GGAACTGCCAGTGTA



TCATACATGC
GAGGGAATTCTAAG






SEQ ID NO: 1229 (L)
SEQ ID NO: 1230 (R)



GCCTTTTTGAAGAAA
GATGAGCAATTCTTA



CTCCACGAAGAG
GGTTTTGGCTCAGAT






SEQ ID NO: 1231 (L)
SEQ ID NO: 1232 (L)



GCTGGAAACATTTCC
AAGGAGAAGGGGTTG



GACCCTG
AAATTGTTGATAGAG






SEQ ID NO: 1233 (L)
SEQ ID NO: 1234 (L)



ATCAAGTCCTTTGAC
GCAAGAGTGGTGATC



AGTGCATCTCAAG
GTGGTGAGACT






SEQ ID NO: 1235 (R)
SEQ ID NO: 1236 (L)



TTTTTTTGAAGAAGC
TCTTATCCTTTGTCG



AGGATGCTGATCTAA
CAGAGACTATCTGAG






SEQ ID NO: 1237 (R)
SEQ ID NO: 1238 (L)



GGCTATTGAGTGGCC
AGGTTGTTACCGTGG



AGACTTCCC
GCAACTCTG






SEQ ID NO: 1239 (R)
SEQ ID NO: 1240 (L)



GTGGTGGAGGTGGCT
CCAGAAAAAAAGACC



GGAATG
AGGCCACAG






SEQ ID NO: 1241 (L)
SEQ ID NO: 1242 (R)



GCCTTCTACCCCATG
CAGCAGCCAGTAAGG



AGAAAGACCAG
AGGAGAAGG






SEQ ID NO: 1243 (L)
SEQ ID NO: 1244 (L)



GAGTTCAGGACCAGC
GTGGAAAAGGCTTTA



TCATTGAAAAGA
GCCATGGACAG






SEQ ID NO: 1245 (R)
SEQ ID NO: 1246 (L)



AGATCTGTCTTACAA
CCAAGGCTTGACCCT



CCTATTAGAAGATTT
CGTTTTG






SEQ ID NO: 1247 (L)
SEQ ID NO: 1248 (R)



AAACAGCAAGAACTG
ACAAGTCATCAATTG



CTTCGGCAG
CTGGCTCAGAA






SEQ ID NO: 1249 (R)
SEQ ID NO: 1250 (L)



GGTCAAGAAAGTGAC
GTCCTCCGACAGTGC



TCATCAGAGACCTCT
TTGGCA






SEQ ID NO: 1251 (R)
SEQ ID NO: 1252 (L)



AAGATGAATCCGGCC
CGGAGTCAGCTGCCA



TCGGC
AGAGACAG






SEQ ID NO: 1253 (R)
SEQ ID NO: 1254 (L)



GTGCTATACTTGGTA
GACCATCATCCAGGG



GATCAGAAACTCAGG
CATCCTG






SEQ ID NO: 1255 (L)
SEQ ID NO: 1256 (L)



TGACACGCTTCCCTG
CAGCTCCTGACCAAC



GATTGG
CCCAAG






SEQ ID NO: 1257 (L)
SEQ ID NO: 1258 (L)



ACAGGGACGCCATCG
TGAAATCCGACACTA



AATCCG
CTGATTCTAGTCAAG






SEQ ID NO: 1259 (L)
SEQ ID NO: 1260 (R)



TTGGAGAAGATCTAT
GTTACTCTGGAAGAA



GGGTCAGACAGAATT
GTCAACTCCCAAATA






SEQ ID NO: 1261 (R)
SEQ ID NO: 1262 (R)



AACTCGAAAATTAAT
GACTGGGAGGTGCTG



GCTGAAAATAAGGCG
GTCCTAGG






SEQ ID NO: 1263 (R)
SEQ ID NO: 1264 (R)



TTTAAGGCTGCAAGC
AATCATCGGACTCAG



AGTATTTACAACAGA
GTACATCTGTGAGTG






SEQ ID NO: 1265 (R)
SEQ ID NO: 1266 (L)



GCCTGTGCAGTGGGA
GTTCAAAAACTGAAG



CTGATTG
GACTCTGAAGCTGAG






SEQ ID NO: 1267 (L)
SEQ ID NO: 1268 (L)



CGCCAATTGTAAACA
CCTTATTGATTGGCC



AAGTGGTGACAC
AACAATCAACAG






SEQ ID NO: 1269 (R)
SEQ ID NO: 1270 (R)



CCCAGCCCTGGGGAG
CCGTAGCTCCATATT



CCCCT
GGACATCCC






SEQ ID NO: 1271 (L)
SEQ ID NO: 1272 (R)



CCCTGAGAATCTGGG
TGTGTGCCTCCTGAC



ACCTCAACAG
GAAGCC






SEQ ID NO: 1273 (R)
SEQ ID NO: 1274 (L)



GCCACAGTGGAGACC
GCCAAGAGGAGCTCA



AGTCAGC
TGAGGCAG






SEQ ID NO: 1275 (L)
SEQ ID NO: 1276 (L)



TCTCTAGCAGTTACT
AACTCACAACGGTAG



ATGGATGACTTCCGG
GAGAGAAACCTGAAG






SEQ ID NO: 1277 (L)
SEQ ID NO: 1278 (R)



AGCCCGGGACCGTTT
AAATGTGGAGCCCAG



AAAAAACTG
GAGGAAGG






SEQ ID NO: 1279 (L)
SEQ ID NO: 1280 (R)



AATGGTCAGAAACCC
GATGCAATTCGAAGT



TCCATAACCTGAAG
CACAGCGAAT






SEQ ID NO: 1281 (L)
SEQ ID NO: 1282 (R)



CGGACGCATCACTTG
AGCTGATAGACACAC



CACTTCTAGAA
ACCTTAGCTGGATAC






SEQ ID NO: 1283 (L)
SEQ ID NO: 1284 (R)



CTTTGCTGAATGCTC
CTTGTAATCTGGATG



CAGCCAAG
TGATTCTGGGGTTT






SEQ ID NO: 1285 (R)
SEQ ID NO: 1286 (R)



GAAAGCCCTTCTTGT
GTAACAGTATCGGGA



ATGTCAATGCC
CCCTTACTGCACAT






SEQ ID NO: 1287 (R)
SEQ ID NO: 1288 (R)



ACATTACTGGTTATA
CTCAAGCTTTTAAAA



GAATTACCACAACCC
TCGAGACCACCCC






SEQ ID NO: 1289 (L)
SEQ ID NO: 1290 (R)



AGCCCCAGTCCCAGC
AATGCAGCTCTTCAG



CCCAG
CATCTGTTTATTCG






SEQ ID NO: 1291 (L)
SEQ ID NO: 1292 (L)



CGAGGGTGTTCTTGA
CTCCGCCCCACAGTC



CGATTAATCAACAG
CACGAG






SEQ ID NO: 1293 (L)
SEQ ID NO: 1294 (L)



GTGGCGGAATCGGTG
CGCCATCATCCTCAT



GTAGAG
CATCATCATAG






SEQ ID NO: 1295 (R)
SEQ ID NO: 1296 (L)



AGATCATCACTGGTA
ACAGTCTCTTGCAAT



TGCCAGCCTC
CGGCTAAAAAAAAGA






SEQ ID NO: 1297 (L)
SEQ ID NO: 1298 (L)



CTATCAGAAGAAAAT
AGAAAACTCTTAAAG



CGGCACCTGAGA
AATGCAGCAGCTTGG






SEQ ID NO: 1299 (R)
SEQ ID NO: 1312 (R)



GACACTGGGGTTGGG
GGTCCTGTCGGGGAA



AAATCAAGC
CCCTCT






SEQ ID NO: 1300 (L)
SEQ ID NO: 1301 (L)



CCCAGCGCTACCTTG
CAGTTTGCTGTGTGT



TCATTCAG
TTGCTCAAACAG






SEQ ID NO: 1302 (L)
SEQ ID NO: 1303 (R)



TACTTGGACTAGTTT
GACATGAACAAGCTG



ATATGAAATTTGTGG
AGTGGAGGCGGCG






SEQ ID NO: 1304 (R)
SEQ ID NO: 1305 (R)



CTACATCTACATCCA
CCTTGCCTCCCCGAT



CCACTGGGACAAG
TGAAAG






SEQ ID NO: 1306 (L)
SEQ ID NO: 1307 (L)



GTGCCACGGTGTCCG
ATTTTAATGAAAACA



GATATG
CAGCAGCACCTAGAG






SEQ ID NO: 1308 (L)
SEQ ID NO: 1309 (L)



ATGAAGGAAATGCTA
TGCCATCTCCAGGCC



AAGCGATTCCAAG
TTGCAG






SEQ ID NO: 1310 (R)
SEQ ID NO: 1311 (R)



GCCCGGCTGTGCTGG
TCCCGGCCAGTGTGC



CTCCA
AGCTG









Description of sequences 1 to 102 and 866 to 1123 and 1209 to 1312 according to the invention












TABLE 2








Number of probes described



Number of probes
in international patent



in the invention
application PCT/FR2014/052255









SEQ ID NO: 103 to 127
SEQ ID NO: 1 to 25



SEQ ID NO: 128
SEQ ID NO: 30



SEQ ID NO: 129
SEQ ID NO: 31



SEQ ID NO: 130 to 137
SEQ ID NO: 113 to 120



SEQ ID NO: 138 to 168 and
SEQ ID NO: 374 to 405



SEQ ID NO: 825




SEQ ID NO: 169 to 194 and
SEQ ID NO: 524 to 559



SEQ ID NO: 826 to 835




SEQ ID NO: 195 to 198
SEQ ID NO: 26 to 29



SEQ ID NO: 199 to 245
SEQ ID NO: 66 to 112



SEQ ID NO: 246 to 344
SEQ ID NO: 121 to 219



SEQ ID NO: 345 to 403
SEQ ID NO: 616 to 674



SEQ ID NO: 404 to 428
SEQ ID NO: 750 to 774



SEQ ID NO: 429 to 436
SEQ ID NO: 734 to 741



SEQ ID NO: 437 to 479
SEQ ID NO: 438 to 480



SEQ ID NO: 480 to 504
SEQ ID NO: 35 to 59



SEQ ID NO: 505
SEQ ID NO: 64



SEQ ID NO: 506
SEQ ID NO: 65



SEQ ID NO: 507 to 514
SEQ ID NO: 267 to 274



SEQ ID NO: 515 to 546
SEQ ID NO: 406 to 437



SEQ ID NO: 547 to 582
SEQ ID NO: 560 to 595



SEQ ID NO: 583 to 586
SEQ ID NO: 60 to 63



SEQ ID NO: 587 to 633
SEQ ID NO: 220 to 266



SEQ ID NO: 634 to 732
SEQ ID NO: 275 to 373



SEQ ID NO: 733 to 791
SEQ ID NO: 675 to 733



SEQ ID NO: 792 to 816
SEQ ID NO: 775 to 799



SEQ ID NO: 817 to 824
SEQ ID NO: 742 to 749










Correspondence between sequences 103 to 835 and the sequences described in international application PCT/FR2014/052255. The L/R information for sequences 103 to 835 is indicated in FIGS. 4-5, 7 to 9 of international application PCT/FR2014/052255.





BRIEF DESCRIPTION OF THE FIGURES

Other features, details, and advantages of the invention will become apparent on reading the appended Figures.



FIG. 1



FIG. 1 shows the diagram of a chromosomal translocation leading to the expression of a fusion transcript detectable by the invention. FIG. 1A (top) shows the obtaining of a fusion mRNA following a chromosomal translocation between gene A and gene B. FIG. 1B (bottom) shows the step of reverse transcription of this fusion mRNA, in order to obtain cDNA. Next there is a step of incubating with the probes and hybridizing them with the complementary portions of cDNA. Probe S1 consists of a sequence complementary to the last nucleotides of exon 2 of cDNA gene A, and probe S2 consists of a sequence complementary to the first nucleotides of exon 2 of cDNA gene B. Probe S1 is fused at 5′ with a barcode sequence SA′ as well as with a primer sequence SA. Probe S2 is fused at 3′ with a primer sequence SB. Due to the adjacency of exons 2 of gene A and gene B, probes S1 and S2 are side by side. Next there is a ligation step by a DNA ligase. The adjacent probes are now bound. S1 and S2 thus form a continuous sequence, with SA and SB. PCR is then performed. Using suitable primers, the bound probes are amplified. In the current case, the primers used are the sequence SA and the complementary sequence of SB (called B′). The results obtained are then analyzed by sequencing.



FIG. 2



FIG. 2 shows the diagram of an exon skipping leading to the expression of a transcript corresponding to an exon skipping detectable by the invention. FIG. 2A (top) shows the cDNA obtained after reverse transcription in the case of normal splicing, and FIG. 2A (bottom) shows the cDNA obtained after reverse transcription in the case of a splicing abnormality. FIG. 2B (top) shows that in the absence of mutation (normal case), after hybridization of the probes, the sequences obtained are as follows: S13L-S14R and S14L-S15R. FIG. 2B (bottom) shows that in the presence of a mutation (abnormal case of exon skipping), after hybridization of the probes, the sequence obtained is as follows: S13L-S15R.



FIG. 3



FIG. 3 shows an example of probe construction according to the invention. FIG. 3A shows the hybridization of the probes after formation of a fusion gene. The number 1 represents the first primer sequence; the number 2 represents the molecular barcode sequence; the number 3 represents the first probe which hybridizes to the left side of the fusion; the number 4 represents the second probe which hybridizes to the right side of the fusion; the number 5 represents the second primer sequence. Probes 3 and 4 represent an example of a pair of probes according to the invention. Each probe consists of a specific sequence capable of hybridizing at the end of an exon and has a primer sequence at its end. Here, a random 7-base molecular barcode is added between the primer sequence and the specific sequence of the left probe. FIG. 3B shows a fusion transcript before analysis with a next-generation sequencer of the Illumina® type. When a fusion transcript is detected, two probes hybridize side by side, enabling their ligation. The ligation product can then be amplified by PCR using primers corresponding to the primer sequences. In FIG. 3B, these primers themselves carry extensions (P5 and P7) which allow analysis of the PCR products on a next-generation sequencer of the Illumina type.



FIG. 4



FIG. 4 shows translocations identified using the invention. The new rearrangements specifically revealed by the probes of the invention are indicated with dark lines. The already known rearrangements, in particular those described in international application PCT/FR2014/052255, are indicated with light lines. Each line represents an abnormal gene junction possibly present in a tumor, between the genes listed on the left of the figure and those listed on the right. The mix shown here makes it possible to simultaneously search for more than 50 different rearrangements that are recurrent in carcinomas. In addition, due to the use of several probes for certain genes targeting different exons, recombinations capable of leading to the expression of hundreds of different transcripts are detectable.



FIG. 5



FIG. 5 shows the number of fusion RNA molecules present in the starting sample tested according to Example 1. This graph shows that 729 fusion RNA molecules were present in the starting sample, and that this result was amplified by a factor of 135.8 during the PCR step. 98,993 sequences were thus obtained at the end of the PCR step.



FIG. 6



FIG. 6 represents one of the strategies which makes it possible to detect a skipping of exon 14 of the METgene, by means of the invention. In FIG. 6A, the selected probes hybridize to the ends of exons 13, 14 and 15 of this gene. In a normal situation, splicing transcripts of this gene induces junctions between exons 13 and 14, and 14 and 15. In a pathological situation, for example if a mutation destroys the splicing donor site of exon 14, the tumor cells express an abnormal transcript, resulting from the junction of exons 13 and 15. The various amplification products obtained by means of the invention are visible in FIG. 6B on a capillary sequencer, after amplification using a pair of primers of which one is labeled with a fluorochrome. These products, which differ in their sequence, can also easily be revealed using a next-generation sequencer.



FIG. 7



FIG. 7 shows the construction of the sequences as analyzed by the software. The terms “Oligo 5′” and “Oligo 3′” represent a pair of probes according to the invention. The term “UMI” represents the molecular barcode sequence. The terms “11” and “12” represent the primer sequences. The term “index” represents the sequence index. The terms “P5” and “P7” correspond to extensions, useful for the use of a next-generation sequencer.



FIG. 8



FIG. 8 shows an example of a read in FASTQ format.



FIG. 9



FIG. 9 shows the diagram of a skipping of exons in the EGFR gene leading to expression of a transcript corresponding to an exon skipping detectable by the invention. FIG. 9A (top) shows the cDNA obtained after reverse transcription in the case of a normal splicing, and FIG. 9B (bottom) shows the cDNA obtained after reverse transcription in the case of a splicing abnormality.



FIG. 9B (top) shows that in the absence of mutation (normal case), after hybridization of probes S1L, S2R, S7L and SBR, the sequences obtained are as follows: S1L-S2R and 57L-S8R. FIG. 2B (bottom) shows that in the presence of a mutation (abnormal case in the presence of exon skipping), after hybridization of the probes, the sequence obtained is as follows: S1L-S8R (deletion of exons 2 to 7 has taken place).



FIG. 10



FIG. 10 shows the number of fusion RNA molecules present in the starting sample tested according to Example 3. This graph shows that 587 fusion RNA molecules were present in the starting sample, and that this result was amplified by a factor of 259.3 during the PCR step. 152,227 sequences were thus obtained at the end of the PCR step.



FIG. 11



FIG. 11 shows the number of fusion RNA molecules present in the starting sample tested according to Example 4. This graph shows that 505 fusion RNA molecules were present in the starting sample, and that this result was amplified by a factor of 123.1 during the PCR step. 62,151 sequences were thus obtained at the end of the PCR step.



FIG. 12



FIG. 12 shows the number of fusion RNA molecules present in the starting sample tested according to Example 5. This graph shows that 965 fusion RNA molecules were present in the starting sample, and that this result was amplified by a factor of 123.5 during the PCR step. 119,161 sequences were thus obtained at the end of the PCR step.



FIG. 13



FIG. 13 shows the diagram of a 5′-3′ expression imbalance leading to the expression of a transcript corresponding to different alleles, detectable by the invention. Expression levels depend on the transcriptional regulatory regions of the rearranged alleles. For example, the expression of alleles I and III is (Sn_Sn+1)=(Sn+2_Sn+3), the expression of alleles I and II is (Sn+4_Sn+5)=(Sn+6_Sn+7). However, when the transcriptional regulatory regions of genes A and B are not equivalent, then the expression of the 5′ exons (Sn_Sn+1) and (Sn+2_Sn+3) is different from the expression of the 3′ exons expressions (Sn+4_Sn+5) and (Sn+6_Sn+7). For example, in lung carcinomas carrying a fusion of the ALK gene (gene B), alleles I and III, whose expression is controlled by the regulatory regions of ALK, are very weakly expressed, while allele II, controlled by the regulatory regions of the partner gene A, is strongly expressed. This therefore results in a 5′-3′ imbalance, with: (Sn+4_Sn+5)=(Sn+6_Sn+7)»(Sn_Sn+1)=(Sn+2_Sn+3).



FIG. 14



FIG. 14 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 15



FIG. 15 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 16



FIG. 16 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 17



FIG. 17 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 18



FIG. 18 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 19



FIG. 19 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 20



FIG. 20 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 21



FIG. 21 shows an example of the probes which can be used according to the invention, as well as the gene which this probe makes it possible to detect. L/R indicates whether the probe is “Left” or “Right”, as indicated above.



FIG. 22



FIG. 22 shows an example obtained during analysis of a splicing abnormality of the MET gene.



FIG. 23



FIG. 23 shows an example obtained during analysis of a splicing abnormality of the MET gene.



FIG. 24



FIG. 24 shows an example obtained during analysis of a splicing abnormality of the EGFR gene.



FIG. 25



FIG. 25 shows an example obtained during analysis of a splicing abnormality of the EGFR gene.



FIG. 26



FIG. 26 shows an example obtained during analysis of a 5′-3′ expression imbalance. FIG. 27



FIG. 27 shows an example obtained during analysis of a 5′-3′ expression imbalance. FIG. 28



FIG. 28 shows novel probes (SEQ ID NO: 1211 to 1312) and illustrates the cancers they detect. The so-called “full” sequences include the primer sequence, the molecular barcode sequence (for the so-called “Left” probes), and the specific sequence of the probe (called SEQ ID NO: 1313 to 1414).





EXAMPLES
Example 1: Diagnosing a Carcinoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above (more particularly at least probes SEQ ID NO: 1 to 13 and 14 to 91).


At the end of the PCR step, 98,993 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes allows accurately determining the number of fusion RNA molecules present in the starting sample (in the case tested here: 729, see FIG. 5).


Table 3 shows the results obtained.














TABLE 3






Number







of



Sequences


Complete sequence
reads
Barcode
Left probe
Right probe
identified 







AAAAATACCCACACCTGGG
156
AAAAATA
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 851)
AG
(SEQ ID NO: 3)



(SEQ ID NO: 837)


(SEQ ID







NO: 31)







AAAATGACCCACACCTGGG
72
AAAATGA
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 852)
AG (SEQ ID
(SEQ ID



(SEQ ID


NO: 31)
NO: 3)



NO: 838)










AAAATGCCCCACACCTGGG
74
AAAATGC
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 853)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 839)


31)







AAACACTCCCACACCTGGG
22
AAACACT
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 854)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 840)


31)







AAACGAGCCCACACCTGG
209
AAACGA
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


GAAAGGACCTAAAGTGTAC

G (SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


CGCCGGAAGCACCAGGAG

NO: 855)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 841)


31)







AAACTGCCCCACACCTGGG
172
AAACTGC
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 856)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 842)


31)







AAACTGTCCCACACCTGGG
175
AAACTGT
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 857)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 843)


31)







AAAGAGACCCACACCTGG
25
AAAGAG
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


GAAAGGACCTAAAGTGTAC

A (SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


CGCCGGAAGCACCAGGAG

NO: 858)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 844)


31)







AAAGATGCCCACACCTGGG
155
AAAGATG
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 859)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 845)


31)







AAAGGCTCCCACACCTGG
34
AAAGGC
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


GAAAGGACCTAAAGTGTAC

T (SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


CGCCGGAAGCACCAGGAG

NO: 860)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 846)


31)







AAAGGTACCCACACCTGGG
68
AAAGGTA
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 861)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 847)


31)







AAAGTCACCCACACCTGGG
50
AAAGTCA
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 862)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 848)


31)







AAAGTGTCCCACACCTGGG
149
AAAGTGT
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 863)
AG (SEQ ID NO:
(SEQ ID NO: 3)



(SEQ ID NO: 849)


31)







AAAGTTCCCCACACCTGGG
166
AAAGTTC
CCCACACCTGG
TGTACCGCCGGAA
EML4E13GTL-


AAAGGACCTAAAGTGTACC

(SEQ ID
GAAAGGACCTAA
GCACCAGGAG
ALKE20DTL


GCCGGAAGCACCAGGAG

NO: 864)
AG (SEQ ID
(SEQ ID



(SEQ ID NO: 850)


NO: 31)
NO: 3)






 . . .
 . . .
 . . .
 . . .

 . . .









Example of probes used and results obtained during a diagnosis of carcinoma


Analysis of the sequence corresponding to PCR products makes it possible to identify the two partner genes involved in the chromosomal rearrangement, here the EML4 and ALK genes. The diagnosis of carcinoma was thus confirmed for the patient to be tested.


This rearrangement is recurrent in lung carcinomas, and makes the patient eligible for certain targeted therapies.


Example 2: Determining a Skipping of Exon 14 of the MET Gene

The sample from a subject was analyzed to confirm or rule out the presence of a skipping of exon 14 of the MET gene. Said sample was subjected to an RT-MLPA step according to the invention, using the probes described above (more particularly at least probes SEQ ID NO: 96 to 99).


In a normal situation, the splicing of the transcripts of this gene induces junctions between exons 13 and 14, and 14 and 15. In a pathological situation, for example if a mutation destroys the splicing donor site of exon 14, tumor cells express an abnormal transcript, resulting from the junction of exons 13 and 15 (FIG. 6A).


The various amplification products obtained by virtue of the invention are visible in FIG. 6B on a capillary sequencer, after amplification using a pair of primers, one of which is labeled with a fluorochrome. These products, which differ in their sequence and in their size, can also easily be revealed using a next-generation sequencer.


Example 3: Diagnosing a Carcinoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above (more particularly at least probes SEQ ID NO: 1 to 13 and 14 to 91).


At the end of the PCR step, 152,227 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes makes it possible to accurately determine the number of fusion RNA molecules present in the starting sample (in the case tested here: 587, see FIG. 10).


Table 4 shows the results obtained.














TABLE 4






Number



Sequences


Complete sequence
of reads
Barcode
Left probe
Right probe
identified




















ATTGCTGTGGGAAATAATG
1020
GTATTGC
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 851)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
967
GTGCTCA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1125)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
803
CTAGGGC
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1126)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
800
ATGCTAT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1127)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
775
CTTTGTA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1128)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
750
TGACCAA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1129)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
740
AGGTCTT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1130)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
731
TCCATTT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1131)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
648
TCGTTGA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1132)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124))


ID NO: 52)







ATTGCTGTGGGAAATAATG
592
GAAAATA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1133)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
590
GCGAGTA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1134)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
576
GGGGGTA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1135)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
572
TCCAGCC
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1136)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
566
ACGCTTA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1137)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
554
TCCTGCG
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1138)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
553
GTGGGCT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1139)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
552
GGCCGGC
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1140)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
548
GGGTCAC
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1141)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
521
CGAGATT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1142)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
519
ACCTGAT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1143)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
509
GCGGCTA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1144)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
507
GACGTCT
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1145)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
504
GTGTCTA
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1146)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







ATTGCTGTGGGAAATAATG
499
CGTACTG
ATTGCTGTGG
GAGGATCCAAAGT
KIF5BE15GTL-


ATGTAAAGGAGGATCCAAA

(SEQ ID 
GAAATAATGAT
GGGAATTCCCT
RETE12DTL


GTGGGAATTCCCT

NO: 1147)
GTAAAG (SEQ
(SEQ ID NO: 8)



(SEQ ID NO: 1124)


ID NO: 52)







 . . . 
 . . . 
 . . . 
 . . . 

 . . . 









Example of probes used and results obtained during a diagnosis of carcinoma


Analysis of the sequence corresponding to PCR products makes it possible to identify the two partner genes involved in the chromosomal rearrangement, here the KIF5B and RET genes. The diagnosis of carcinoma was thus confirmed for the patient to be tested.


This rearrangement is recurrent in lung carcinomas, and makes the patient eligible for certain targeted therapies.


Example 4: Diagnosing a Sarcoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above (more particularly at least probes SEQ: 868 to 938 and probes SEQ ID NO: 940 to 1054).


At the end of the PCR step, 62,151 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes makes it possible to accurately determine the number of fusion RNA molecules present in the starting sample (in the case tested here: 505, see FIG. 11).


Table 5 shows the results obtained.














TABLE 5






Number



Sequences


Complete sequence
of reads
Barcode
Left probe
Right probe
Identified




















AGCAGCAGCTACGGGCAG
472
CATGAG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1151)
1148)







AGCAGCAGCTACGGGCAG
397
TCGCGG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

C (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1152)
1148)







AGCAGCAGCTACGGGCAG
385
TTTGTTT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1153)
1148)







AGCAGCAGCTACGGGCAG
369
CGTGTG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1154)
1148)







AGCAGCAGCTACGGGCAG
363
CTTGGG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1155)
1148)







AGCAGCAGCTACGGGCAG
357
TAGCGAT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1156)
1148)







AGCAGCAGCTACGGGCAG
354
CGTCCTT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1157)
1148)







AGCAGCAGCTACGGGCAG
344
GTGAGT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

C (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1158)
1148)







AGCAGCAGCTACGGGCAG
336
CGGGGG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1159)
1148)







AGCAGCAGCTACGGGCAG
329
GAGCCT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1160)
1148)







AGCAGCAGCTACGGGCAG
318
GTTTTGG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1161)
1148)







AGCAGCAGCTACGGGCAG
312
GTCGGG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

A (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1162)
1148)







AGCAGCAGCTACGGGCAG
304
TTGGTCC
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1163)
1148)







AGCAGCAGCTACGGGCAG
303
ACGGAA
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1164)
1148)







AGCAGCAGCTACGGGCAG
291
AGTATTA
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1165)
1148)







AGCAGCAGCTACGGGCAG
289
CATTCGC
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1166)
1148)







AGCAGCAGCTACGGGCAG
278
TAGTAAG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1167)
1148)







AGCAGCAGCTACGGGCAG
273
TCCTACG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1168)
1148)







AGCAGCAGCTACGGGCAG
267
GGTATG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1 E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1169)
1148)







AGCAGCAGCTACGGGCAG
261
CGGGGT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

A (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1170)
1148)







AGCAGCAGCTACGGGCAG
258
CTGATAG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1171)
1148)







AGCAGCAGCTACGGGCAG
257
TAGGGT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1172)
1148)







AGCAGCAGCTACGGGCAG
251
TGGGGA
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

G (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1173)
1148)







AGCAGCAGCTACGGGCAG
251
GCTGGT
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

C (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1174)
1148)







AGCAGCAGCTACGGGCAG
242
TATGGG
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

C (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1175)
1148)







AGCAGCAGCTACGGGCAG
241
ATACGTC
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

(SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1176)
1148)







AGCAGCAGCTACGGGCAG
240
AGACAA
AGCAGCAGCTA
GTTCACTGCTGGC
EWSR1E7-FLI1E5


CAGAGTTCACTGCTGGCCT

C (SEQ ID
CGGGCAGCAGA
CTATACAACCTC



ATACAACCTC

NO:
(SEQ ID No:
(SEQ ID NO: 1149)



(SEQ ID NO: 1150)

1177)
1148)







 . . . 
 . . . 
 . . . 
 . . . 

 . . . 









Example of probes used and results obtained during a diagnosis of sarcoma


Analysis of the sequence corresponding to PCR products makes it possible to identify the two partner genes involved in the chromosomal rearrangement, here the EWSR1 and FLI1 genes. The diagnosis of sarcoma was thus confirmed for the patient to be tested.


This rearrangement is recurrent in Ewing sarcomas, which makes it possible to make the diagnosis.


Example 5: Diagnosing a Sarcoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above (more particularly at least probes SEQ: 868 to 938 and probes SEQ ID NO: 940 to 1054).


At the end of the PCR step, 119,161 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes makes it possible to accurately determine the number of fusion RNA molecules present in the starting sample (in the case tested here: 960, see FIG. 12).


Table 6 shows the results obtained.














TABLE 6






Number



Sequences


Complete sequence
of reads
Barcode
Left probe
Right probe
identified




















AGCAGAGGCCTTATGGATA
610
ATGTGTC
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1181)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
604
GGGGGC
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

G (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1182)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
601
ATATTCG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1183)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
524
CGCGTTT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1184)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
507
GTGGTTA
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1185)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1078)







AGCAGAGGCCTTATGGATA
505
CGGGTT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

T (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1186)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
491
GGGAGG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

C (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1187)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
472
GTATATG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1188)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
439
ACCTTGT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1189)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
425
TTGCAGA
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1190)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
416
GGGGCA
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

A (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1191)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
409
GAGGCT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

T (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1192)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
408
I CAI ITT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1193)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
400
GGTGAC
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

T (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1194)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
394
TGTGCG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

T (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1195)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
393
GGGAGA
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

G (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1196)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
391
GCCATTT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1197)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
380
AAGCCA
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

A (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1198)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
370
ATTAGG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

G (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1199)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
365
CCTGGTT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1200)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
364
GATTTGT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1201)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
359
TAGAGTT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1202)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
359
TGCTTTG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1203)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1080)


1178)







AGCAGAGGCCTTATGGATA
343
TCCTAGC
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1204)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
339
GTAATCT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

(SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1205)
CAG (SEQ ID NO:
(SEQ ID NO: 1179)



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
338
GAGCCT
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

G (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1206)
CAG (SEQ ID NO:
(SEQ ID NO: 1179



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
335
CCGCAG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

G (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1207)
CAG (SEQ ID NO:
(SEQ ID NO: 1179



(SEQ ID NO: 1180)


1178)







AGCAGAGGCCTTATGGATA
332
GCCGGG
AGCAGAGGCCT
ATCATGCCCAAGA
SS18E10-SSXE6


TGACCAGATCATGCCCAAG

A (SEQ ID
TATGGATATGAC
AGCCAGCAGA



AAGCCAGCAGA

NO: 1208)
CAG (SEQ ID NO:
(SEQ ID NO: 1179



(SEQ ID NO: 1180)


1178)




 . . . 
 . . . 
 . . . 
 . . . 

 . . . 









Example of probes used and results obtained during a diagnosis of sarcoma


Analysis of the sequence corresponding to PCR products makes it possible to identify the two partner genes involved in the chromosomal rearrangement, here the SS18 and SSX genes. The diagnosis of sarcoma was thus confirmed for the patient to be tested.


This rearrangement is recurrent in synovial sarcomas, which makes it possible to make the diagnosis.


Example 6: Examples of Fusion Associated with Pathologies

Table 7 shows some examples.











TABLE 7







EWSR1
SMAD3
Acral fibroblastic spindle cell neoplams


MYB
NFIB
Adenoid cystic carcinoma


MYBL1
NFIB
Adenoid cystic carcinoma/Breast adenoid carcinoma


CDH11
USP6
Aneurysmal bone cyst


COL1A1
USP6
Aneurysmal bone cyst


CTNNB1
USP6
Aneurysmal bone cyst


PAFAH1B1
USP6
Aneurysmal bone cyst


RUNX2
USP6
Aneurysmal bone cyst


PAX3_7
FKHR(FOXO1)
ARMS/Biphenotypic sinonasal sarcoma (BSNS)


PAX3_7
NCOA1
ARMS/Biphenotypic sinonasal sarcoma (BSNS)


BCOR
CCNB3
BCOR round cell sarcoma


RREB1
MKL2
Biphenotypic oropharyngeal sarcoma/Ectomesenchymal




chondromyxoid tumor


PAX3_7
MAML3
Biphenotypic sinonasal sarcoma (BSNS)


EWSR1
NFATC1
Bone hemangioma


FN1
EGF
Calcifying aponeurotic fibroma


EWSR1
CREB1
Clear cell sarcoma soft tissues and digestive




tract/Angiomatoid fibrous histiocytoma


EML4
NTRK3
Congenital fibrosarcoma


KHDRBS1
NTRK3
Congenital pediatric CD34+ skin tumor/dermohypodermal




spindle cell neoplasm


SRF
NCOA2
Congenital spindle cell RMS


TEAD1
NCOA2
Congenital spindle cell RMS


VGLL2
NCOA2
Congenital spindle cell RMS/Small round cell sarcomas


ARID1A
PRKD1
Cribriform adenocarcinoma of salivary gland origin


DDX3X
PRKD1
Cribriform adenocarcinoma of salivary gland origin


EWSR1
TRIM11
Cutaneous melanocytoma


COL1A1
PDGFB
Dermatofibrosarcoma protuberans


COL6A3
PDGFD
Dermatofibrosarcoma protuberans


EMILIN2
PDGFD
Dermatofibrosarcoma protuberans


EWSR1
WT1
Desmoplastic small round cell tumor


EPC1
BCOR
Endometrial stromal sarcoma (aggressive)


EPC1
SUZ12
Endometrial stromal sarcoma (aggressive)


WWTR1
CAMTA1
Epithelioid hemangioendothelioma


YAP1
TFE3
Epithelioid hemangioendothelioma


WWTR1
FOSB
Epithelioid Hemangioma


ZFP36
FOSB
Epithelioid hemangioma


EWSR1
TFCP2
Epithelioid rhabdomyosarcoma


EWSR1
E1AF
Ewing Sarcoma


FUS
ERG
Ewing Sarcoma/PNET


EWSR1
ETV1
Ewing Sarcoma/PNET


EWSR1
FEV
Ewing Sarcoma/PNET


FUS
FEV
Ewing Sarcoma/PNET


EWSR1
FLI1
Ewing Sarcoma/PNET


EWSR1
NFATC2
Ewing Sarcoma/PNET


EWSR1
SMARCA5
Ewing Sarcoma/PNET


EWSR1
ERG
Ewing Sarcoma/PNET/Desmoplastic small round cell tumor


EWSR1
NR4A3
Extraskeletal myxoid chondrosarcoma


TAF15_68
NR4A3
Extraskeletal myxoid chondrosarcoma


TCF12
NR4A3
Extraskeletal myxoid chondrosarcoma


TFG
NR4A3
Extraskeletal myxoid chondrosarcoma


HSPA8
NR4A3
Extraskeletal myxoid chondrosarcoma


ETV6
NTRK3
Head and Neck analog Mammary secretory




carcinoma/Mammary secretory carcinoma/




Papillary thyroid carcinoma


EWSR1
CREM
Hyalinizing renal cell carcinoma


TFG
MET
Infantile spindle cell sarcoma with neural features


CARS
ALK
inflammatory myofibroblastic tumor


CLTC
ALK
inflammatory myofibroblastic tumor


FN1
ALK
inflammatory myofibroblastic tumor


KIF5B
ALK
inflammatory myofibroblastic tumor


NPM
ALK
inflammatory myofibroblastic tumor


RANBP2
ALK
inflammatory myofibroblastic tumor


RNF213
ALK
inflammatory myofibroblastic tumor


SEC31A
ALK
inflammatory myofibroblastic tumor


TFG
ALK
inflammatory myofibroblastic tumor


TPM3
ALK
inflammatory myofibroblastic tumor


CCDC6
RET
inflammatory myofibroblastic tumor


CCDC6
ROS
inflammatory myofibroblastic tumor


CD74
ROS
inflammatory myofibroblastic tumor


EZR
ROS
inflammatory myofibroblastic tumor


LRIG3
ROS
inflammatory myofibroblastic tumor


SDC4
ROS
inflammatory myofibroblastic tumor


TPM3
ROS
inflammatory myofibroblastic tumor


THBS1
ALK
inflammatory myofibroblastic tumor + Uterine Inflammatory




Myofibroblastic Tumors


EML4
ALK
inflammatory myofibroblastic tumours/Lung Cancer


ATIC
ALK
inflammatory myofibroblastic tumours/Lung Cancer


SLC34A2
ROS
inflammatory myofibroblastic tumours/Lung Cancer


A2M
ALK
inflammatory myofibroblastic tumours/Lung Cancer


BIRC6
ALK
inflammatory myofibroblastic tumours/Lung Cancer


CLIP1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


DCTN1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


EEF1G
ALK
inflammatory myofibroblastic tumours/Lung Cancer


GCC2
ALK
inflammatory myofibroblastic tumours/Lung Cancer


HIP1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


KLC1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


LMO7
ALK
inflammatory myofibroblastic tumours/Lung Cancer


MSN
ALK
inflammatory myofibroblastic tumours/Lung Cancer


PPFIBP1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


SQSTM1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


TPR
ALK
inflammatory myofibroblastic tumours/Lung Cancer


TRAF1
ALK
inflammatory myofibroblastic tumours/Lung Cancer


KIF5B
MET
inflammatory myofibroblastic tumours/Lung Cancer


STARD3NL
MET
inflammatory myofibroblastic tumours/Lung Cancer


CLIP1
RET
inflammatory myofibroblastic tumours/Lung Cancer


ERC1
RET
inflammatory myofibroblastic tumours/Lung Cancer


TRIM33
RET
inflammatory myofibroblastic tumours/Lung Cancer


CLIP1
ROS
inflammatory myofibroblastic tumours/Lung Cancer


CLTC
ROS
inflammatory myofibroblastic tumours/Lung Cancer


ERC1
ROS
inflammatory myofibroblastic tumours/Lung Cancer


GOPC
ROS
inflammatory myofibroblastic tumours/Lung Cancer


KDELR2
ROS
inflammatory myofibroblastic tumours/Lung Cancer


LIMA1
ROS
inflammatory myofibroblastic tumours/Lung Cancer


MSN
ROS
inflammatory myofibroblastic tumours/Lung Cancer


PPFIBP1
ROS
inflammatory myofibroblastic tumours/Lung Cancer


TFG
ROS
inflammatory myofibroblastic tumours/Lung Cancer


TMEM106B
ROS
inflammatory myofibroblastic tumours/Lung Cancer


KIF5B
RET
inflammatory myofibroblastic tumours/Lung Cancer


NCOA4
RET
Intraductal carcinomas of salivary gland


TRIM27
RET
Intraductal carcinomas of salivary gland


COL1A2
PLAG1
Lipoblastoma


COL3A1
PLAG1
Lipoblastoma


HAS2
PLAG1
Lipoblastoma


TPR
NTRK1
Locally agressive lipofibromatosis-like neural tumor/Uterine




sarcoma with features of fibrosarcoma


LMNA
NTRK1
Locally agressive lipofibromatosis-like neural tumor/Uterine




sarcoma with features of fibrosarcoma/Pediatric




haemangiopericytoma-like sarcoma


BRD8
PHF1
Low grade endometrial stromal sarcoma


EPC2
PHF1
Low grade endometrial stromal sarcoma


JAZF1
PHF1
Low grade endometrial stromal sarcoma


JAZF1
SUZ12
Low grade endometrial stromal sarcoma


EPC1
PHF1
Low grade endometrial stromal sarcoma/Ossifying




fibromyxoid tumor


EWSR1
CREB3L1
Low grade fibromyxoid sarcoma/Sclerosing epithelioid




fibrosarcoma


FUS
CREB3L1
Low grade fibromyxoid sarcoma/Sclerosing epithelioid




fibrosarcoma


EWSR1
CREB3L2
Low grade fibromyxoid sarcoma/Sclerosing epithelioid




fibrosarcoma


FUS
CREB3L2
Low grade fibromyxoid sarcoma/Sclerosing epithelioid




fibrosarcoma


ETV6
RET
Mammary analog secretory carcinoma


IRF2BP2
CDX1
Mesenchymal chondrosarcoma


HEY1
NCOA2
Mesenchymal chondrosarcoma


EWSR1
YY1
Mesothelioma


FUS
ATF1
Mesothelioma/Angiomatoid fibrous histiocytoma


CRTC1
MAML2
Mucoepidermoid carcinoma


CRTC3
MAML2
Mucoepidermoid carcinoma


FUS
KLF17
Myoepithelial carcinoma/myoepithelioma soft tissue


EWSR1
PBX1
Myoepithelial carcinoma/myoepithelioma soft tissue


EWSR1
PBX3
Myoepithelial carcinoma/myoepithelioma soft tissue


LIFR
PLAG1
Myoepithelial carcinoma/myoepithelioma soft tissue


EWSR1
ZNF444
Myoepithelial carcinoma/myoepithelioma soft tissue


EWSR1
ATF1
Myoepithelial carcinoma/myoepithelioma soft




tissue/mesothelioma/Clear cell sarcoma soft tissues and




digestive tract/Angiomatoid fibrous histiocytoma


EWSR1
POU5F1
Myoepithelial carcinoma/myoepithelioma soft




tissue/Undifferenciated round cell sarcoma/Ewing




Sarcoma/PNET


SRF
RELA
Myofibroma/myopericytoma


CCBL1
ARL1
Myxofibrosarcoma


KIAA2026
NUDT11
Myxofibrosarcoma


AFF3
PHF1
Myxofibrosarcoma


EWSR1
DDIT3(CHOP)
Myxoid/round cell liposarcoma


FUS
DDIT3(CHOP)
Myxoid/round cell liposarcoma


MYH9
USP6
Nodular fasciitis/Cellular fibroma of tendon sheath


BRD3
NUTM1
NUT carcinoma


BRD4
NUTM1
NUT carcinoma


ZNF592
NUTM1
NUT Carcinoma


FUS
TFCP2
Osseous RMS/epithelioid rhabdomyosarcoma


CREBBP
BCORL1
Ossifying fibromyxoid tumor


EP400
PHF1
Ossifying fibromyxoid tumor


MEAF6
PHF1
Ossifying fibromyxoid tumor


ZC3H7B
BCOR
Ossifying fibromyxoid tumor/High grade endometrial stromal




sarcoma


STRN
ALK
Papillary thyroid carcinoma


RAD51B
OPHNI
PEComa


DVL2
TFE3
PEComa/Xp11 renal cell carcinoma


ACTB
GLI1
Pericytoma/Pericytoma AND Malignant Epithelioid Neoplasm


FN1
FGF1
Phosphaturic mesenchymal tumor


FN1
FGFR
Phosphaturic mesenchymal tumor


MXD4
NUTM1
Primary ovarian undifferentiated small round cell sarcoma


YWHAE
NUTM2A_B
Primitive myxoid mesenchymal tumor of infancy




(PMMTI)/SoftTissue Undifferentiated Round Cell Sarcoma of




Infancy/Clear cell sarcoma of the kidney/High grade




endometrial stromal sarcoma


MEIS1
NCOA2
Primitive spindle cell sarcoma of the kidney


TMPRSS2
ERG
Prostate Tumor


TMPRSS2
ETV1
Prostate Tumor


ACTB
FOSB
Pseudomyogenic hemangioendothelioma


ETV4
NCOA2
Soft tissue angiofibroma


NAB2
STAT6
Solitary fibrous tumor


EWSR1
PATZ1
Spindle round cell sarcomas/Ewing Sarcoma/PNET


SS18
SSX
Synovial sarcoma


SS18L1
SSX
Synovial sarcoma


CRTC1
SS18
Undifferenciated round cell sarcoma


EWSR1
SP3
Undifferenciated round cell sarcoma/Ewing Sarcoma/PNET


CITED2
PRDM10
Undifferenciated round cell sarcoma/Undifferentiated




pleomorphic sarcoma


RAD51B
HMGA2
Uterine leiomyoma


RBPMS
NTRK3
Uterine sarcoma with features of fibrosarcoma


GREB1
NCOA2
Uterine Tumors Resembling Ovarian Sex Cord Tumors


NonO
TFE3
Xp11 renal cell carcinoma


PRCC
TFE3
Xp11 renal cell carcinoma


RBM10
TFE3
Xp11 renal cell carcinoma


SFPQ
TFE3
Xp11 renal cell carcinoma


ASPSCR1
TFE3
Xp11 renal cell carcinoma/Alveolar soft part sarcoma


FXR1
BRAF
ganglioma


C11orf95
RELA
ependymoma


ETV6
NTRK3
xanthoastrocytoma


FGFR1
TACC1
pilocytic astrocytoma


FGFR3
TACC3
glioblastoma


GOPC
ROS
glioblastoma


KIAA1549
BRAF
glioblastoma, pilocytic astrocytoma, ganglioma


MYB
QKI
angiocentric glioma


PTEN
COL17A1
glioblastome


PTPRZ1
MET
glioblastome


RNF213
SLC26A11
glioblastome


SLC44A1
PRKCA
tumeur glioneuronale papillaire


NACC2
NTRK2
pilocytic astrocytoma


MKRN1
BRAF
Papillary Thyroid Carcinoma


BCAN
NTRK1
Glioma


PTEN
COL17A1
glioblastoma multiforme


X
NTRK1
Various


X
NTRK2
Various


X
NTRK3
Various









Example 7: Diagnosing a Lung Carcinoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above.


At the end of the PCR step, 70,571 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes makes it possible to precisely determine the number of fusion RNA molecules present in the starting sample (in the case tested here: (71 junctions between exons 13 and 14, 119 between exons 13 and 15, and 92 between exons 14 and 15 of the METgene)). These results, and in particular the detection of transcripts 13-15, indicate the presence of a splicing abnormality of the MET gene, making this patient eligible for targeted therapy (see FIG. 22).



FIG. 23 shows the results obtained. The results allow making the diagnosis.


Example 8: Diagnosing a Lung Carcinoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above.


At the end of the PCR step, 116,165 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes makes it possible to precisely determine the number of fusion RNA molecules present in the starting sample (in the case tested here: (455 junctions between exons 1 and 2, 332 between exons 1 and 8, and 349 between exons 7 and 8 of the EGFR gene)). These results, and in particular the detection of transcripts 1-8, indicate the presence of an internal deletion of the EGFR gene, making this patient eligible for targeted therapy (see FIG. 24).



FIG. 25 shows the results obtained. The results allow making the diagnosis.


Example 9: Diagnosing a Lung Carcinoma

The sample from a subject was subjected to an RT-MLPA step according to the invention, using the probes described above.


At the end of the PCR step, 59,214 sequences corresponding to unique PCR products (fusion transcripts) were read by next-generation sequencing. These sequences all carry a 7 base-pair molecular barcode sequence at 5′. Due to PCR amplification, these molecular barcode sequences are read several times (number of reads). Counting these barcodes makes it possible to precisely determine the number of fusion RNA molecules present in the starting sample (in the case tested here: 157 junctions between exons 21 and 22, 75 between exons 22 and 23, 52 between exons 25 and 26, and 50 between exons 27 and 28 of the ALK gene). These results, and in particular the demonstration of an expression imbalance between the 5′ and 3′ portions of the ALK gene, indicate that this gene is rearranged, making this patient eligible for targeted therapy (see FIG. 26).



FIG. 27 shows the results obtained. The results allow making the diagnosis.

Claims
  • 1. Method for diagnosing cancer in a subject, comprising an RT-MLPA step on a biological sample obtained from said subject, wherein: the RT-MLPA step is carried out using at least one pair of probes comprising at least one probe selected from:the probes SEQ ID NO: 1 to 13, and/or 866 to 938, and/or SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, and/orthe probes SEQ ID NO: 96 to 99, and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939, and/orthe probes SEQ ID NO: 1108 to 1123,each of the probes being fused, at at least one end, with a primer sequence,and at least one of the probes of said pair comprising a molecular barcode sequence.
  • 2. Method according to claim 1, wherein the probes SEQ ID NO: 14 to 91 are also used for the RT-MLPA step, each of the probes being fused, at at least one end, with a primer sequence, and at least one of the probes preferably comprising a molecular barcode sequence.
  • 3. Method according to any one of claims 1 to 2, wherein the cancer is associated with formation of a fusion gene and/or an exon skipping and/or a 5′-3′ imbalance.
  • 4. Method according to any one of claims 1 to 3, wherein the cancer involves at least one gene selected from RET, MET, ALK, EGFR and/or ROS.
  • 5. Method according to any one of claims 1 to 3, wherein the cancer is associated with the formation of an exon skipping of the MET or EGFR gene.
  • 6. Method according to any one of claims 1 to 3, wherein the cancer is a carcinoma, in particular a lung carcinoma, and more particularly a bronchopulmonary carcinoma.
  • 7. Method according to any one of claims 1 to 2, wherein the cancer is a sarcoma, a brain tumor, a gynecological tumor, or a tumor of the head and neck.
  • 8. Method according to any one of claims 1 to 4, wherein the primer sequence is selected from the sequences: SEQ ID NO: 92 and SEQ ID NO: 93, orSEQ ID NO: 94 and SEQ ID NO: 95.
  • 9. Method according to any one of claims 1 to 5, wherein the molecular barcode sequence is represented by SEQ ID NO: 100.
  • 10. Method according to any one of claims 1 to 6, wherein the cancer associated with the formation of a fusion gene is diagnosed using at least one pair of probes comprising at least one probe selected among probes SEQ ID NO: 1 to 13, SEQ ID NO: 866 to 938 and/or SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, optionally the probes SEQ ID NO: 14 to 91, and wherein each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 92 and SEQ ID NO: 93, and wherein at least one of the probes comprises a molecular barcode sequence.
  • 11. Method according to any one of claims 1 to 6, wherein the cancer associated with an exon skipping is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 96 to 99 and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939, and wherein each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 94 and SEQ ID NO: 95 and wherein at least one of the probes comprises a molecular barcode sequence.
  • 12. Method according to any one of claims 1 to 6, wherein the cancer associated with a 5′-3′ imbalance is diagnosed using at least one pair of probes comprising at least one probe selected from probes SEQ ID NO: 1108 to 1123, and wherein each of the probes is fused, at at least one end, with a primer sequence, preferably selected from the sequences SEQ ID NO: 94 and SEQ ID NO: 95and wherein at least one of the probes comprises a molecular barcode sequence.
  • 13. Method according to any one of claims 1 to 12, wherein said biological sample is selected among blood and a biopsy from said subject.
  • 14. Method according to any one of claims 1 to 13, wherein said RT-MLPA step comprises at least the following steps: a) extraction of RNA from the biological sample from the subject,b) conversion of the RNA extracted in a) into cDNA by reverse transcription,c) incubation of the cDNA obtained in b) with a pair of probes comprising at least one probe selected from:probes SEQ ID NO: 1 to 13, and/or SEQ ID NO: 866 to 938 and/or SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, and/orprobes SEQ ID NO: 96 to 99, and/or SEQ ID NO: 1105 to 1107 and/or SEQ ID NO: 939, and/orprobes SEQ ID NO: 1108 to 1123,each of the probes being fused, at at least one end, with a primer sequence,and at least one of the probes of said pair comprising a molecular barcode sequence,d) addition of a DNA ligase to the mixture obtained in c), in order to establish a covalent bond between two adjacent probes,e) PCR amplification of the adjacent covalently bound probes obtained in d), in order to obtain amplicons.
  • 15. Method according to claim 10, wherein it comprises a step f) of analyzing the results of the PCR of step e), preferably by sequencing.
  • 16. Method according to claim 11, wherein the sequencing step is a step of capillary sequencing or next-generation sequencing.
  • 17. Method according to claim 15 or 16, wherein it comprises a step g) of determining the level of expression of the amplicons that are obtained at the end of the PCR step, implemented by computer.
  • 18. Kit comprising at least probes SEQ ID NO: 1 to 13, and/or probes SEQ ID NO: 96 to 99, and/or probes SEQ ID NO: 866 to 938 and/or probes SEQ ID NO: 940 to 1104, and/or SEQ ID NO: 1211 to 1312, and/or probes SEQ ID NO: 1105 to 1107 and/or probe SEQ ID NO: 939, and/or probes SEQ ID NO: 1108 to 1123, preferably further comprising probes SEQ ID NO: 14 to 91, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes preferably comprising a molecular barcode sequence.
  • 19. Kit comprising at least the following probes: SEQ ID NO: 1 to 13, SEQ ID NO: 14 to 91, SEQ ID NO: 96 to 99, SEQ ID NO: 103 to 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130 to 137, SEQ ID NO: 138 to 168, SEQ ID NO: 169 to 194, SEQ ID NO: 195 to 198, SEQ ID NO: 199 to 245, SEQ ID NO: 246 to 344, SEQ ID NO: 345 to 403, SEQ ID NO: 404 to 428, SEQ ID NO: 429 to 436, SEQ ID NO: 437 to 479, SEQ ID NO: 480 to 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID NO: 507 to 514, SEQ ID NO: 515 to 546, SEQ ID NO: 547 to 582, SEQ ID NO: 583 to 586, SEQ ID NO: 587 to 633, SEQ ID NO: 634 to 732, SEQ ID NO: 733 to 791, SEQ ID NO: 792 to 816, SEQ ID NO: 817 to 824, SEQ ID NO: 825, SEQ ID NO: 826 to 835, SEQ ID NO: 866 to 938, SEQ ID NO: 940 to 1104, SEQ ID NO: 1105 to 1107, SEQ ID NO: 939, and SEQ ID NO: 1108 to 1123, and SEQ ID NO: 1211 to 1312, each of the probes preferably being fused, at at least one end, with a primer sequence, and at least one of the probes preferably comprising a molecular barcode sequence.
  • 20. Method for determining the level of expression of amplicons that are obtained at the end of a PCR step, said method being implemented by computer, and comprising: (1) a step of demultiplexing the results of amplicons obtained at the end of a PCR step,(2) a step of searching for pairs of probes used during the PCR step,(3) a step of counting the results and molecular barcode sequences, and optionally(4) a step of evaluating the quality of sequencing of the sample.
Priority Claims (2)
Number Date Country Kind
18 60174 Nov 2018 FR national
19 08905 Aug 2019 FR national
PCT Information
Filing Document Filing Date Country Kind
PCT/FR2019/052617 11/5/2019 WO