The present application relates to methods and compositions which can be used to detect cancer in mammals, humans in particular. It notably describes serum markers for cancers and their uses in diagnostic methods. It also concerns tools and/or kits which can be used to implement these methods (reagents, probes, primers, antibodies, chips, cells, etc.) their preparation and their uses. The invention can be used to detect the presence or the progression of a cancer in mammals, particularly breast cancer, including the early phase thereof.
In women, breast cancer is the leading cause of cancer deaths in industrialized countries. Age is the greatest risk factor. Risk increases by 0.5% per year of age in Western countries. Other risk factors are known such as the number of pregnancies and the age of the first pregnancy, breast feeding, onset of puberty and menopause, estrogen treatments after the menopause, stress and nutrition.
The diagnosis of breast cancer is generally made by mammography. It is estimated however that the minimum tumor size which can be detected by mammography is 1 cm, which represents a past progression of 8 years on average at the time of diagnosis. Small tumors are much less malignant than can be extrapolated from their size: the aggressiveness of large tumors is not only due to their size but also to their <<inherent aggressiveness >>, which increases with the age of a tumor (Bucchi et al., Br J Cancer 2005, p. 156-161; Norden T, Eur J Cancer 1997, p. 624-628).
The benefit of mammography has been demonstrated in hundreds of thousands of patients these last 30 years. However, these tests have biases such as age dependence on sensitivity, hormone therapy, the number of mammographies performed, experience of the medical practitioner and others (see Fletcher and Elmore, NEJM 2003, p. 1672f or Baines C J, Breast J 2005, S7-10).
Analysis of the expression of a panel of target genes is also relevant in the fight against breast cancer, and particular mention may be made of the analysis of a panel of 176 genes which are expressed differentially between patients expressing the ER receptor and patients not expressing the ER receptor (Bertucci et al, Human Molecular Genetics, 2000; 9: 2981-2991). The analysis may also be cited of a panel of 37 genes which can be used for early diagnosis of breast cancer (Sharma et al, Breast cancer Research 2005, 7: R634-R644). However, the patients included in this study were all suspected of having breast disease (<<suspect initial mammogram <<), and this panel of genes could be ill-adapted for routine tests to diagnose breast cancer prior to any mammography.
There is therefore a major need for diagnostic tools and tests able to detect cancer reliably, simply and at an early stage.
The present application provides a group of biological markers which can be used alone or in combination(s), preferably in combination(s), to detect, characterize or follow up, in reliable manner, the presence or progression of a cancer in mammals, preferably breast cancer. The invention is particularly advantageous in that it can be implemented using whole blood, without requiring tissue biopsy or separation steps.
More particularly, the present application results from the identification of serum genetic markers characteristic of human patients having breast cancer. These markers correspond for example to variations in splicing or in gene expression levels and, either alone or advantageously in combination, can allow the detection in patients of the presence or stage of progression of a cancer. They advantageously enable the detection of the presence of a breast cancer right from its early stages (namely stages I and II, i.e. in particular at a stage when mammography is ineffective), in a manner that is reliable and simple using a sample of whole blood. They can also be used to detect early stage breast cancers which cannot be detected by mammography since they lie below its detection threshold.
One subject-matter of the present application concerns a method (in vitro or ex vivo) to detect the presence or risk of developing a cancer in a mammal, comprising the determination in a biological sample of the mammal, of the presence (or absence or (relative) quantity) of one or preferably several target molecules chosen from among:
In one particular variant of embodiment, the method comprises the combined determination of the presence, or absence, or (relative) quantity of at least 5, 10, 15, 20, 30, 40, 50, 60, 70 or more target molecules such as defined above. “Combined” determination designates the fact that a hybridization profile (or signature) involving several markers is determined. Combined determination is typically performed simultaneously i.e. by global measurement of an expression profile. Nonetheless, combined determination may also be performed by parallel or sequential measurements of several markers, leading to identification of a profile. With the invention, it is effectively possible to establish and determine a hybridization profile (or signature) on a group of markers in order to assess the presence or risk of developing a cancer in a mammal. The hybridization profile is typically performed using a combination of several markers chosen from among the above-indicated targets, for example containing all these targets.
In one particular embodiment, the method of the invention comprises the determination of the presence (or absence or (relative) quantity) in a mammalian biological sample of at least 5 separate target molecules chosen from among those defined above, preferably at least 10.
In preferred embodiments, the method of the invention comprises the combined determination of the presence (or absence or (relative) quantity) in a mammalian biological sample of particular sub-groups of target molecules chosen from among those indicated above. Said sub-groups, described in the examples, are particularly adapted for the detection, notably at an early stage, of the presence of a breast cancer in patients on the basis of a sample of whole blood.
Therefore, in one particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of the nucleic acids of an entire panel of targets (or signatures) comprising markers such as defined under items a) to d) above, preferably all the molecules of one of panels 1 to 11 defined in the present application.
Therefore in one particular embodiment, the method of the invention comprises the combined determination in a mammalian biological sample of the presence (or absence or (relative) quantity) of all the nucleic acids of Panel 1, comprising the sequences shown Table 4, column 1, or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers can be used for the predictive detection of the presence, the risk of developing, or the stage of progression of a cancer. In one particular embodiment, the method also comprises the detection of one or more of the other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination in a mammalian biological sample of the presence (or absence or (relative) quantity) of nucleic acids containing the sequences indicated in SEQ ID NOs: 18, 19, 23, 26, 51, 52, 53, 54, 55, 69, 80, 125, 145, 148, 225, 228, 240 and 312 (PANEL 2) or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment, the method also comprises the detection of one or more other target molecules such as those defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of nucleic acids containing the sequences indicated in SEQ ID NOs: 18, 19, 23, 26, 27, 51, 52, 53, 54, 55, 69, 80, 125, 145, 148, 161, 188, 225, 228, 240, 280 and 312 (PANEL 3) or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by the nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment, the method also comprises the detection of one or more other target molecules, such as those defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of nucleic acids containing the sequences indicated in SEQ ID NOs: 13, 16-19, 23, 26-28, 47, 51-55, 58, 69, 80, 81, 89, 116, 121, 125, 145, 148, 158, 160, 161, 164, 189, 190, 225, 229, 240, 248, 280, 281, 284, 299, 300, 310 and 312 (PANEL 4) or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment the method also comprises the detection of one or more of the other target molecule such as defined previously.
In another particular embodiment the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of nucleic acids containing the sequences indicated in SEQ ID NOs: 7, 13, 14, 16-19, 23-28, 47, 51-55, 58, 69, 80, 81, 89, 116, 121, 125, 137, 139, 145, 148, 158, 160, 161, 164, 189, 190, 225, 228, 229, 240, 245, 248, 252, 280, 281, 284, 290, 298-300, 310 and 312 (PANEL 5) or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment the method also comprises the detection of one or more other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of the nucleic acids containing the sequences indicated in SEQ ID NOs: 5, 7, 13, 14, 16-20, 23-28, 47, 51-55, 58, 64, 69, 80, 81, 88-90, 116, 121, 125, 137, 139, 145, 148, 158, 160, 161, 164, 188-191, 208, 222, 225, 228, 229, 236, 240, 242, 245, 248, 252, 280, 281, 284, 290, 298-300 and 309-312 (PANEL 6) or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In another particular embodiment, the method also comprises the detection of one or more other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of all the nucleic acids of Panel 7, comprising the sequences indicated in Table 4, or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular mode, the method also comprises the detection of one or more other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of all the nucleic acids in Panel 8, comprising the sequences indicated in Table 4, or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment, the method aloes comprises the detection of one or more of the other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of all the nucleic acids in Panel 9, comprising the sequences indicated in Table 4, or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment, the method also comprises the detection of one or more of the other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of all the nucleic acids in Panel 10, comprising the sequences indicated in Table 4, or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment, the method also comprises the detection of one or more of the other target molecules such as defined previously.
In another particular embodiment, the method of the invention comprises the combined determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of the nucleic acids comprising the sequences indicated in SEQ ID No: 23, 52, 53, 148 and 225 (PANEL 11), or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or having a sequence complementary thereto and/or functional analogs thereof derived from other species, and/or polypeptides encoded by these nucleic acids. The examples given in the present application effectively show that this panel of markers enables the predictive detection of the presence, risk of developing, or stage of progression of a cancer. In one particular embodiment, the method also comprises the detection of one or more of the other target molecules such as defined previously.
In one specific embodiment, the method of the invention comprises the determination, in a mammalian biological sample, of the presence (or absence or (relative) quantity) of the nucleic acids respectively comprising the sequences indicated in SEQ ID NOs: 1-437 or a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, or nucleic acids having a complementary sequence thereof.
A further particular subject of the invention lies in a method of detecting the presence or risk of developing a cancer in a mammal, which comprises contacting, under conditions allowing hybridization between complementary sequences, the nucleic acids derived from a blood sample of the mammal with a set of probes specific to the following target molecules:
Also, analysis of the different genes identified in the invention, whose expression is altered in patients, shows that they belong to families of genes involved in cell signaling pathways or in common regulating mechanisms. Therefore, in particular this analysis evidences numerous genes involved in signaling cascades used for the transducing of messages initiated by TLR stimulation, in the secretion of cytokins or in the activation of T lymphocytes. The invention therefore evidences the fact that alterations in these signaling cascades occur in patients suffering from cancer, and that any gene or RNA taking part in these cascades or any deregulation of these genes or RNA may form a marker of the presence of or the predisposition to cancer. Said alterations may occur during oxidative stress imposed by tumor cells on monocytes, macrophages and dendritic cells. This stress is part of the consequences of the different cell interactions occurring between the immune system and the cancer cells at the tumor. The evidencing, in blood circulation, of molecular events revealing alterations in the signaling cascades involved in immune responses therefore represents a new tool and a novel approach allowing the evidencing of a tumor in the body on the basis of a blood sample.
Therefore one particular subject-matter of the invention concerns a method (in vitro ou ex vivo) to detect the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in a gene or RNA taking part in a signaling pathway involved in immune response (innate or acquired), the presence of said alteration being indicative of the presence or risk of developing a cancer in this mammal. The signaling pathway involved in immune response is advantageously chosen from among TLR stimulation, cytokin secretion or T-lymphocyte activation.
Initial stimulation of the dendritic cells represents innate response and is made via toll-like receptors (TLRs). Multiple TLRs react to different ligands which may be carried by pathogens or tumor cells and induce the production of different pro-inflammatory cytokins by the dendritic cells. This phenomenon is accompanied by the presentation of antigens to naïve T-cells, thereby initiating a specific, acquired immune response. Among the molecular actors of these signaling cascades, those which may be preferably cited are receptors, adaptors, enzymes, factors involved in the regulation of gene expression, chemokins, cytokins and interleukins.
One particular subject of the invention is a method for the in vitro or ex vivo detection of the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in a gene or RNA involved in regulating the signaling pathway which controls the phenomenon of innate immunity. More particularly, this phenomenon mobilizes the cascade initiated by the TLRs and regulates the activity of the macrophage and dendritic cells.
Another particular subject of the invention is a method for the in vitro or ex vivo detection of the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in a gene or RNA involved in regulating the signaling pathway which controls the phenomenon of acquired or adaptive immunity. More particularly, this phenomenon involves T-lymphocyte receptors (TCRs).
A further particular subject of the invention is a method for the in vitro or ex vivo detection of the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in a gene or RNA involved in the transition, coordination between innate and acquired immunities. More particularly, these genes are involved in the biosynthesis of lipid molecules from precursors such as arachidonic acid.
One particular subject of the invention therefore lies in a method (in vitro or ex vivo) to detect the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in a gene or RNA involved in the stimulation of TLRs, in the secretion of cytokins, or in the activation of T-lymphocytes, said gene or RNA advantageously being chosen from among receptors, adaptors, enzymes, factors involved in the regulation of gene expression, chemokins, cytokins and interleukins, the presence of said alteration being indicative of the presence or risk of developing a cancer in this mammal.
Alteration of a gene or RNA in the meaning of the invention is any altered expression, namely deregulation of splicing in particular, leading to the onset of particular spliced forms or to a change in the (relative) quantity or ratio between the different splice forms.
As is described in the remainder of the text, the present application describes the identification of splicing deregulations in the actors of signaling cascades involved in innate and acquired immunities, found in the blood of cancer patients (Table 5). Oligonucleotides, derived from RNA sequences whose expressions are affected by splicing alterations, can be deposited or synthesized on any solid carrier and hybridized with nucleic probes derived from control blood samples and blood samples from cancer patients allowing the selection of the most discriminating oligonucleotides. More broadly, the oligonucleotides can be chosen to represent any mRNA encoding any protein involved in innate and acquired immunities. More particularly, these oligonucleotides may derive from the genes indicated in Table 6. Alterations may also be detected at the structure or expression levels of polypeptides encoded by these genes or RNA, for example using specific antibodies as is described in detail in the remainder of the text.
One particular subject of the invention therefore lies in a method (in vitro ou ex vivo) to detect the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in at least one, preferably at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 genes or corresponding RNAs indicated in Table 5, in particular altered splicing of said gene or RNA, the presence of said alteration being an indication of the presence or risk of developing a cancer in this mammal.
Another particular subject of the invention therefore lies in a method (in vitro or ex vivo) to detect the presence or risk of developing a cancer in a mammal, comprising the determination of the presence (or absence) in a mammalian biological sample, preferably a blood (derived) sample, of an alteration in at least one, preferably at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 genes or corresponding RNAs indicated in Table 6, in particular altered splicing of said gene or RNA, the presence of said alteration being an indication of the presence of risk of developing a cancer in this mammal.
The present invention is based on the evidencing and characterizing of serum biological events characteristic of the presence of a breast cancer in a human patient. These events form (bio)markers whose detection in a patient, preferably in combination, allows the determination, even at an early stage, of the risk of developing said cancer or the presence of said cancer. In the meaning of the invention, the terms markers and transcripts are used interchangeably, except when the context gives them a specific meaning.
The identified biological events typically correspond to changes in the regulation of gene expression. This may concern partial or full inhibition of the expression of genes or RNA, or some forms of genes of RNA, an increase in the expression of genes or some forms of genes or RNA, the onset or disappearance of gene splicing forms, etc.
The invention is therefore based on the detection, in a sample, of one or more target molecules representing biological events thus identified. As indicated above, these target molecules may be chosen from among:
The target molecule can be the complete sequence of the gene or RNA or protein corresponding to sequences SEQ ID NOs: 1-437, or a distinctive fragment thereof i.e. a fragment whose sequence is specific to said gene or RNA, or to said protein, and/or comprises a variability domain (splicing, deletion, polymorphism, etc.) representing the biological event to be detected. The complete list of markers and the corresponding genes are indicated in Table 1.
The term <<functional analog >> designates an analog derived from another mammalian species. Sequences SEQ ID NOs: 1-437 were identified from humans, and these sequences form efficient markers adapted for the detection of cancer in human patients. Nonetheless, for application of the methods of the invention to other species of mammals, it is generally preferable to use functional analogs of these sequences, characterized in the species under consideration. These analogs can be identified using any technique known to persons skilled in the art, notably having regard to the sequences provided in the application and the names of the corresponding genes.
In one particular embodiment, the method comprises the determination of the presence of at least one nucleic acid according to a) to c).
In one very particular embodiment, the method is used to detect a cancer in a human individual and comprises the determination of the presence of at least one nucleic acid according to a) or b). Further preferably, the method comprises the combined detection of the presence (or absence) or (relative or absolute) quantity of a panel of target markers, such as defined in the present application (Panels 1 to 11 or Tables 5 and 6).
One particular embodiment of the invention lies in a method to detect the presence or risk of developing a breast cancer in a mammal, comprising the combined determination of the presence or (relative or absolute) quantity, in a mammalian blood sample, of a group of molecules comprising at least the following target molecules:
One particular embodiment of the invention lies in a method to detect the presence or risk of developing a breast cancer in a mammal, comprising the combined determination of the presence or quantity, in a blood sample of the mammal, of the following target molecules:
One particular embodiment of the invention lies in a method to detect the presence or risk of developing a breast cancer in a mammal, comprising the combined determination of the presence or quantity, in a blood sample of the mammal, of the following target molecules:
A further specific subject of the invention lies in a method to detect the presence or risk of developing a breast cancer in a mammal, comprising the detection, in a blood sample of the mammal, of the following target molecules:
The invention also enables the defining of additional panels, comprising at least some markers such as defined previously, which may optionally be combined with other markers. Said panels may be obtained by testing the presence or absence of these markers in patient samples, to define other predictive combinations, possibly specific to particular pathologies.
Different techniques allowing the detection of a species of nucleic acid in a sample can be used in the present invention, such as Northern Blot for example, or selective hybridization, the use of carriers coated with probe oligonucleotides, nucleic acid amplification such as RT-PCR, quantitative PCR or ligation-PCR, etc. These methods may comprise the use of a nucleic probe (e.g. an oligonucleotide) capable of detecting, either selectively or specifically, the target nucleic acid in the sample. Amplification can be conducted according to different methods known per se to persons skilled in the art, such as PCR, LCR, transcription mediated amplification (TMA), strand displacement amplification (SDA), NASBA, the use of allele-specific oligonucloetides (ASO), allele specific amplification, Southern Blot, single strand conformation analysis SSCA, in situ hybridization (e.g., FISH), gel migration, analysis of heteroduplexes, etc.
According to one preferred embodiment, the method comprises the detection of the presence or absence or (relative) quantity of a nucleic acid according to a) to c) by selective hybridization or selective amplification.
Selective hybridization is typically performed using nucleic probes, preferably immobilized on a carrier such as a solid or semi-solid support having at least one surface, whether planar or not, allowing the immobilization of nucleic probes. Said supports are for example a slide, bead, membrane, filter, column, plate etc. They may be made in any compatible material in particular glass, silica, plastic, fiber, metal, polymer, etc. The nucleic probes may be any nucleic acid (DNA, RNA, PNA, etc.), preferably single strand, comprising a sequence specific to a target molecule such as defined under a) to c) above. The probes typically comprise 5 to 400 bases, preferably 8 to 200, more preferably less than 100, and further preferably less than 75, 60, 50, 40 or even 30 bases. The probes may be synthetic oligonucleotides produced on the basis of the sequences of target molecules of the invention using conventional synthesis techniques. Said oligonucleotides typically comprise 10 to 50 bases, preferably 20 to 40, for example around 25 bases. In one particularly advantageous embodiment, several different oligonucleotides (or probes) are used to detect the same target molecule. These may be oligonucleotides specific to different regions of the same target molecule, or aligned differently on one same region. It is also possible to use pairs of probes, of which one member is fully matched with the target molecule, and another is mismatched thereby enabling background noise to be estimated. In the following examples, 6 to 11 pairs of oligonucleotides with 25 bases were used for each target molecule.
The probes may be previously synthesized then deposited on the carrier, or synthesized directly in situ on the carrier, using methods known per se to those skilled in the art. The probes may also be made using genetic techniques e.g. by amplification, recombination, ligation, etc.
The probes so defined form another subject of the present application, as well as their uses (essentially in vitro) for cancer detection. In particularly preferred manner, a set of nucleic probes is used comprising all or a fragment of at least 15 consecutive bases, preferably at least 17, 19, 20, 22 or 25 consecutive bases of each of sequences SEQ ID NO: 1-437, or a complementary strand thereof, advantageously immobilized on a carrier.
Hybridization can be performed under conventional conditions, known to persons skilled in the art and which may be adjusted by such persons (Sambrook, Fritsch, Maniatis (1989) Molecular Cloning, Cold Spring Harbor Laboratory Press). In particular, hybridization can be conducted under conditions of strict, medium or low stringency depending on the desired level of sensitivity, the quantity of material available etc. For example, appropriate conditions for hybridization include a temperature of between 55 and 63° C. for 2 to 18 hours on low density carriers. Other hybridization conditions may be necessary for high density carriers, such as a hybridization temperature of between 45 and 55° C. After hybridization, different washings may be performed to remove non-hybridized molecules, typically in SSC buffers containing SDS such as a buffer containing 0.1 to 10×SSC and 0.5-0.01% SDS. Other washing buffers containing SSPE, MES, NaCl or EDTA may also be used.
In one typical embodiment, the nucleic acids (or chips or carriers) are pre-hybridized in a hybridization buffer (Rapid Hybrid Buffer, Amersham) typically containing 100 μg/ml salmon sperm DNA at 65° C. for 30 min. The nucleic acids of the sample are then contacted with the probes (typically applied to the carrier or chip) at 65° C. for 2 to 18 hours. Preferably, the nucleic acids of the sample are previously labeled using any known labeling (radioactive, enzymatic, fluorescent, luminescent, etc.). The carriers are then washed in a 5×SSC, 0.1% SDS buffer at 65° C. for 30 min, then in a 0.2×SSC, 0.1% SDS buffer. The hybridization profile is analyzed using conventional techniques e.g. by measuring the labeling on the carrier using an appropriate instrument (e.g. InstantImager, Packard Instruments). Hybridization conditions can evidently be adjusted by those skilled in the art, for example by modifying the hybridization temperature and/or saline concentration of the buffer, and through the addition of auxiliary substances such as formamide or single-strand DNA.
One particular subject of the invention lies in a method to detect the presence or risk of developing a breast cancer in a mammal, comprising the contacting, under conditions enabling hybridization between complementary sequences, of nucleic acids derived from a blood sample of the mammal with a set of probes specific to at least the following target molecules:
One particular subject of the invention therefore lies in a method to detect the presence or risk of developing a breast cancer in a mammal, comprising the contacting, under conditions enabling hybridization between complementary sequences, of nucleic acids derived from a blood sample of the mammal with a set of probes specific to the following target molecules:
In particular embodiments, the methods of the invention also use other target molecules and/or other probes, in particular the sub-groups of target molecules mentioned in the present application.
Therefore, one other particular subject of the invention lies in a method to detect the presence or risk of developing a cancer in a mammal, comprising the contacting under conditions enabling hybridization between complementary sequences, of nucleic acids derived from a blood sample of the mammal with a set of probes specific to at least two separate molecules chosen from among the following targets:
The hybridization profile can be compared with one or more reference profiles, in particular a reference profile characteristic of healthy individuals and/or individuals suffering from cancer, the comparison allowing determination of the probability or risk that the tested patient has a cancer. Typically, the comparison is made using computer programs known per se to those skilled in the art.
Selective amplification is preferably performed using a primer or pair of primers allowing amplification of all or part of one of the target nucleic acids in the sample, if any. The primer may be specific to a target sequence according to SEQ ID NO: 1-437, or to a region flanking the target sequence in a nucleic acid of the sample. The primer typically comprises a single-strand nucleic acid, whose length is advantageously between 5 and 50 bases, preferably between 5 and 30. Said primer forms another subject of the present application, and its use (essentially in vitro) for the detection of a cancer in an individual.
In this respect, a further subject of the invention lies in the use of a nucleotide primer or a set of nucleotide primers allowing amplification of all or part of one or preferably several genes or RNAs containing a target sequence according to SEQ ID NO: 1-437, for the detection of a cancer in a mammal, preferably breast cancer, particularly in a human being.
Another particular subject of the invention lies in a method to detect the presence or risk of developing a cancer in a mammal, comprising the contacting, under conditions enabling amplification, of the nucleic acids derived from a blood sample of the mammal with a set of primers specific to at least two separate molecules chosen from among the following targets:
Under another embodiment, the method comprises the determination of the presence of a polypeptide according to d). The evidencing of a polypeptide in a sample can be performed using any technique known per se, in particular using a specific ligand e.g. an antibody or an antibody fragment of derivative. Preferably, the ligand is an antibody specific to the polypeptide, or a fragment of said antibody (e.g. a Fab, Fab′, CDR, etc.), or a derivative of said antibody (e.g a single-chain antibody, ScFv). The ligand is typically immobilized on a carrier, such as a slide, ead, column, plate, etc. The presence of the target polypeptide in the sample can be detected by evidencing a complex between the target and the ligand, for example using a labeled ligand, using a second labeled developing ligand, etc. Immunology techniques which can be used and are well known are ELISA, RIA techniques, etc.
Antibodies specific to the target polypeptides may be produced using conventional techniques, in particular by immunizing a non-human animal with an immunogen containing the polypeptide (or an immunogenic fragment thereof) and collecting the (polyclonal) antibodies or producer cells (to produce monoclonals). Techniques for the production of poly- or monoclonal antibodies, of ScFV fragments, of human or humanized antibodies are described for example in Harlow et al., Antibodies: A Laboratory Manual, CSH Press, 1988; Ward et al., Nature 341 (1989) 544; Bird et al., Science 242 (1988) 423; WO94/02602; U.S. Pat. No. 5,223,409; U.S. Pat. No. 5,877,293; WO93/01288. The immunogen may be produced by synthesis, or by the expression, in a suitable host, of a target nucleic acid such as defined above. Said monoclonal or polyclonal antibody, and its derivatives, having the same antigenic specificity, also form a subject of the present application, as well as their use for cancer detection.
The method of the invention can be applied to any biological sample of the tested mammal, in particular any sample containing nucleic acids or polypeptides. A sample of blood, plasma, platelet, saliva, urine, stools, etc. may advantageously be cited and more generally any tissue, organ or advantageously any biological fluid containing nucleic acids or polypeptides.
In one preferred; particularly advantageous embodiment of implementation, the sample is a blood or plasma sample. The invention follows from the identification of blood markers of cancer, and therefore allows detection of these pathologies without any tissue biopsy, and solely using blood samples.
The sample can be obtained using any technique known per se, for example by taking a sample, using non-invasive techniques, from collections or banks of samples, etc. The sample may also be pre-treated to facilitate accessibility of the target molecules, for example by lysis (mechanical, chemical, enzymatic, etc.), purification, centrifugation, separation, etc. Advantageously the PaxGene system is used (Feezor et al., Physiol Genomics 2004: pp. 247-254). The sample may also be labeled to facilitate determination of the presence of target molecules (fluorescent, radioactive, luminescent, chemical, enzymatic labeling, etc.).
In one preferred embodiment, the biological sample is a sample of whole blood i.e. which has not undergone any separation step, and may optionally be diluted.
The invention can be applied to any mammal, preferably humans. The method of the invention is particularly useful for the detection of breast cancer, in particular detection of the presence, risk of developing, or stage of progression of a breast cancer in a human being. Therefore the data given in the examples show that the invention allows the detection of the presence of a breast cancer with a sensitivity of over 92% and a specificity of more than 86%. It is particularly adapted for screening breast cancer at early stages i.e. stages I or II (such as defined under the “TNM Classification of Malignant Tumors”) developed and maintained by the UICC (“International Union Against Cancer”). The TNM classification is also used by AJCC (“American Joint Committee on Cancer”) and by IFGO (“International Federation of Gynecology and Obstetrics”).
One particular subject-matter of the present application concerns a method to detect the presence, progression, or risk of developing a breast cancer in a human individual, comprising the combined determination, in a biological sample from a human individual, of the presence (or absence or (relative) quantity) of target molecules chosen from among:
Preferably, the method comprises the combined determination of the presence, absence of quantity of 5, 10, 20, 30, 40, 50 or 60 target molecules such as defined above.
A further particular subject-matter of the present application concerns a method to detect the presence, progression, or risk of developing a breast cancer in a human individual, comprising the contacting of a biological sample of the individual containing nucleic acids with a product comprising a carrier on which nucleic acids are immobilized containing a sequence complementary to and/or specific to one or preferably several target molecules chosen from among (i) the nucleic acids containing a sequence chosen from among SEQ ID NO: 1-437 or a fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases and (ii) nucleic acids having a sequence complementary to a sequence according to (i), and determination of the hybridization profile, the profile indicating the presence, stage, of risk of developing a breast cancer in said human individual. Preferably, the product contains separate nucleic acids containing a sequence complementary to and/or specific to at least 5, 10, 20, 30, 40, 50, 60 or more different target molecules such as mentioned above.
A further subject of the present application concerns a product comprising a carrier on which nucleic acids are immobilized containing a sequence complementary to and/or specific to one or preferably several target molecules chosen from among (i) the nucleic acids containing a sequence chosen from among SEQ ID NO: 1-437 or a fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases and (ii) the nucleic acids having a sequence complementary to a sequence according to (i). Preferably, the product comprises separate nucleic acids containing a sequence complementary to and/or specific to at least 5, 10, 20, 30, 40, 50, 60 or more different target molecules such as mentioned above. Preferably, it comprises separate nucleic acids containing a sequence complementary to and/or specific to one of the panels of markers such as defined in the present application.
A further subject of the present application concerns a product containing a carrier on which at least one preferably several nucleic acids are immobilized containing a sequence chosen from among SEQ ID NO: 1-437, or a functional analog thereof. Preferably the product comprises at least 5, 10, 20, 30, 40, 50, 60 or more different nucleic acids chosen from among the nucleic acids mentioned above. In one particular embodiment, the product comprises each of the nucleic acids of sequences SEQ ID NO: 1-376.
A further subject of the present application concerns a product comprising a carrier on which at least one ligand is immobilized of a polypeptide encoded by a target nucleic acid such as defined above i.e. a nucleic acid containing a sequence chosen from among SEQ ID NO: 1-437, a distinctive fragment thereof having at least 15, preferably at least 16, 17, 18, 19, 20, 25 or 30 consecutive bases, a nucleic acid having a sequence complementary thereto or a functional analog thereof. Preferably, the product contains at least 5, 10, 20, 30, 40, 50, 60 or more ligands of different polypeptides chosen from among the polypeptides mentioned above.
The carrier may be any solid or semi-solid carrier having at least one surface whether planar or not, allowing the immobilization of nucleic acids or polypeptides. Said carriers are for example a slide, bead, membrane, filter; column, plate etc. They may be made in any compatible material such as glass, silica, plastic, fiber, metal, polymer, polystyrene, Teflon etc. The reagents can be immobilized on the surface of the carrier using known techniques or, for nucleic acids, synthesized directly in situ on the carrier. Immobilization techniques include passive adsorption (Inouye et al., J. Clin. Microbiol. 28 (1990) 1469), covalent binding. Some techniques are described for example in WO90/03382, WO99/46403. The immobilized reagents on the carrier may be placed in a pre-established order to facilitate the detection and identification of the formed complexes, and according to a variable, adaptable density.
In one embodiment, the product of the invention comprises a plurality of synthetic oligonucleotides between 5 and 100 bases in length, specific to one or more target nucleic acids defined under a) to c).
The products of the invention typically comprise control molecules, used to standardize and/or normalize results.
A further subject of the present application concerns a kit comprising a compartment or container containing at least one, preferably several nucleic acids, comprising a sequence complementary to and/or specific to a target nucleic acid such as defined above and/or one, preferably several ligands of a target polypeptide such as defined previously. Preferably, the product contains at least 5, 10, 20, 30, 40, 50, 60 or more different nucleic acids and/or ligands chosen from the above-mentioned nucleic acids and ligands. In one particular embodiment, the product comprises each of the nucleic acids of sequence SEQ ID NO: 1-437 or a ligand for each of the target polypeptides such as defined above. The kit may also contain reagents for a hybridization or immunological reaction and, optionally, controls and/or instructions.
A further subject of the invention concerns the use of a product or kit such as defined above for the detection of a cancer in a mammal, preferably a human individual, in particular to detect breast cancer.
A further subject of the invention concerns a nucleic acid having a sequence chosen from among SEQ ID NO: 1-437, or a distinctive fragment thereof comprising at least 15 consecutive bases, preferably at least 16, 17, 18, 19, 20, 25 or 30, or a nucleic acid having a sequence complementary thereto or a functional analog thereof. The invention also concerns a cloning or expression vector containing these nucleic acids, and any recombinant cell containing said vector or nucleic acid.
A further subject of the invention concerns the use of a nucleic acid comprising a sequence chosen from among SEQ ID NO: 1-437, or a distinctive fragment thereof containing at least 15 consecutive bases, preferably at least 16, 17, 18, 19, 20, 25 or 30, a nucleic acid having a sequence complementary thereto, or a functional analog thereof, for the detection (essentially in vitro detection) of a cancer in a mammal.
In one particular example of embodiment of the invention, a blood sample is taken from a mammal to be tested. The blood sample may optionally be treated to make the nucleic acids more accessible, and they are labeled. The nucleic acids are then applied to a product such as above-defined and the hybridization profile is determined, permitting the diagnosis of whether or not cancer is present in the tested mammal. The method of the invention is simple, performed ex vivo and allows early detection of a cancer from a blood sample.
Evidently any equivalent technique may be used within the scope of the present application to determine the presence of a target molecule.
Other aspects and advantages of the invention will become apparent on reading the following examples which are to be construed as illustrative and non-limiting.
The examples given below were initially made from 92 blood samples (5 ml of whole blood, taken in two PaxGene tubes). These samples grouped together 37 blood samples from healthy control patients (H, obtained from Etablissement Francais du Sang) and 55 samples from patients suffering from stage I/II breast cancers (CI/II).
The blood samples were directly collected in PAXGene™ Blood RNA tubes (PreAnalytix, Hombrechtikon, CH). After the blood sampling stage and to obtain total cell lysis, the tubes were left at room temperature for 4 h then stored at −20° C. until extraction of the biological material. More precisely, in this protocol, total RNA was extracted using PAXGene Blood RNA® kits (PreAnalytix) following the manufacturer's instructions. In short, the tubes were centrifuged (15 min, 3000 g) to obtain a residue of nucleic acids. This residue was washed and dissolved in a buffer containing protinease K required for digestion of the proteins (10 min at 55° C.). Further centrifuging (5 min, 19 000 g) was performed to remove cell debris and ethanol was added to optimize the fixing conditions of the nucleic acids. Total RNA fixed specifically fixed onto PAXgene RNA spin columns and, before elution thereof, digestion of contaminant DNA was conducted using the RNAse free DNAse set (Qiagen, Hilden, Germany). The quality of total RNA was analyzed on an AGILENT 2100 biolyzer (Agilent Technologies, Waldbronn, Germany).
Three series of <<DATAS >> profiling were conducted between the following groups:
DATAS n°1: Early stage breast cancers (Stage I and II) versus Control group (PBMN study)
DATAS n°2: Late stage breast cancers (Stage III and IV) versus Control group (PBMO study)
DATAS n°3: Breast cancers (Stages I, II, III and IV) versus Control group (PMNP study)
DATAS is a profiling technology of gene expression between two samples, which is able to characterize qualitative differences at messenger RNA level, such as those generated by alternative splicing. This patented technology is described in U.S. Pat. No. 6,251,590.
Total RNA corresponding to the two situations, one normal (mN) the other pathological (mP), is isolated from blood samples using the above-described PAXgene system (PreAnalytix). This RNAs (50 μg per group) is converted into complementary DNA (cN) and (cP) using reverse transcriptase (RT) (Invitrogen) and a biotinylated oligonucleotide oligodT25 (Invitrogen). The samples which contributed to the 50 μg per group are indicated in tables A to F.
Hybrids mN/cP and cN/mP are then produced in liquid phase. After ethanol precipitation of mN and cP and of cN and mP, the precipitates are dissolved in 30 μl of hybridization solution containing 80% formamide and 0.1% SDS for overnight incubation at 40° C. The heteroduplexes are then captured using magnetic Streptavidin beads (Dynal). A magnet is used to hold the beads in the tube during rinsing operations. The beads/heteroduplexes are then dissolved in 50 μl of RNAseH buffer and incubated with RNaseH (Invitrogen) 30 minutes at 37° C. The supernatant is collected after further magnet application. The residual DNA is removed by action of DNAseI (Ambion). After inactivation of the enzyme, the RNA fragments are precipitated with ethanol and dissolved in water treated with DPEC supplemented with RNAse out (Ambion). The RNA fragments are then reverse transcribed (Reverse transcription TaqMan kit, Applied Biosystem) using random hexamer oligonucleotides. The complementary DNAs obtained are then PCR amplified using semi-degenerative primers. 10 pairs of primers are generally used. The amplicons obtained can be visualized by electrophoresis on agarose gel (
These amplified populations are cloned in a TOPO TA (Invitrogen) cloning vector for transformation in a strain of competent E. Coli bacteria (Invitrogen). The colonies are transferred to 96-well plates and cultivated overnight in an ampicillin-supplemented 2XTY medium. A glycerolated stock (50%) is then taken. Generally a 96-well plate is processed per pair of primers in one of the two directions. This gives 96×10×2=1920 colonies to be characterized.
1728 clones were sequenced for DATAS n°1 profiling, 1920 were cloned and sequenced for DATAS n° 2 profiling, and 1920 were cloned and sequenced for DATAS n° 3 profiling.
Table G summarizes the number of clones characterized in the three profiling banks. A clone designated as a <<singleton >> means that the sequence of this clone was only identified once in the bank and that no other clone overlaps this sequence. A <<cluster >> designates a group of clones whose sequences overlap.
The three DATAS profilings subsequently generated 1741 non-redundant clones.
Bio-computerized analysis was performed on the 1741 clones to identify the genes with which they are associated and the known splicing events in public banks of corresponding nucleic acids.
The 1741 DATAS clones were able to be associated with 1170 different genes, some non-overlapping DATAS fragments being associated with different regions of the same gene.
The detection and quantification of the expression of splicing variants per microarray requires the use of a particular probe configuration. Any control messenger RNA/splicing variant pair can be modeled as a long isoform/short isoform (
The set of probes needed to measure the expression of splicing variants is also indicated in
For the design of all these probes, the splicing events corresponding to the DATAS fragments must be identified, followed by identification of the <<target >> regions from which the probes will be designed. The target sequences corresponding to the junction probes C, D and E are defined by a length of 30 nucleotides, 15 nucleotides either side of the junction. It is therefore possible to <<cover >> any junction by probes of 25 nucleotides of the type: 10/15, 11/14, 12/13, 13/12, 14/11 et 15/10 (the sign / representing the junction area).
Constraints are less severe on the <<exonic >> probes A and B. The additional sequence, specific to the long form (sequence 2 in
Therefore the 1170 genes corresponding to the 1741 DATAS clones were used to identify, for each thereof, the cDNAs et ESTs held in public databanks of sequences, potentially having qualitative differences in sequences, a source of splicing events. The events chosen are located at less than 100 nucleotides from the 5′ or 3′ ends of the DATAS fragments.
2108 events could therefore be chosen and listed in an Excel file of which one part describing the extracted data is given in
So as to be able to measure the expression of the 2108 events previously described, a customized DNA chip was designed. On this chip each event was characterized by its five target sequences A-E. For each sequence A and B, 11 pairs of probes of 25 nucleotides were designed, whereas the target sequences of type C, D, or E were detected with 6 pairs of probes with 25 nucleotides.
By pair of probes is meant a first probe which hybridizes perfectly (called PM probes for perfect match) with one of the cDNAs derived from a target transcript, and a second probe, identical to the first probe except for mismatch (i.e. MM probe for mismatched) in the center of the probe. Each MM probe was used to estimate background noise corresponding to hybridization between two nucleotide fragments of non-complementary sequence (Affymetrix technical note “Statistical Algorithms Reference Guide”; Lipshutz, et al (1999) Nat. Genet. 1 Suppl., 20-24). If the design of at least 6 probes for sequences A and B or of at least 4 probes for sequences C, D, et E was impossible, these sequences were not included on the customized chip. Said situation may result from sequences of low complexity containing repeat structures or <<hairpin >> structures whether consecutive or non-consecutive. A sequence size for A-E of less than 30 nucleotides also led to the exclusion of these probes. Solely sequences of good quality, oriented in direction 5′->3′, were used for the design of the probes in accordance with Affymetrix recommendations.
To analyze the expression of target transcripts according to the invention, the complementary DNA (cDNA) of the mRNA contained in total RNA, such as purified above, was obtained from 400 ng of total RNAs through the use of a Klenow 3′-5′-exonuclease enzyme, 100 units of SuperScript II reverse transcription enzyme (Invitrogen), 10 units of the RNAse inhibitor H Superase-IN (Ambion, Huntigdon, UK) and 200 pmol of <<random >> primer containing the T7 promoter (RP-T7-primer, Eurogenetec, Seraing, Belgium).
All the cDNA thus obtained then underwent in-vitro transcription, conducted using a MEGAscript T7 kit (Ambion) for 16 h at 37° C. The resulting cRNA was subsequently purified on a column with an RNeasy Mini kit (Invitrogen), and the quality of the cRNA obtained was analyzed using the AGILENT 2100 bioanalyzer. The purified cRNA was then quantified by spectrophotometry, and the solution of cRNA was adjusted to a concentration of 1.24 μg/μl cRNA. Twenty-six micrograms of cRNA were then dispatched into two Eppendorf tubes, and 3 μg <<random >> primers were added to each tube. Reverse transcription was performed with 800 units of SuperScriptII (Invitrogen) and 10 units of an inhibitor of RNAse H, in the presence of Klenow enzyme, for 1 h at 37° C. The double-strand cDNA resulting from this approach was then purified using the QIAquick PCR Purification Kit (Qiagen) and quantified by spectrophotometry. Sixteen micrograms of cRNA distributed over three Eppendorf tubes were then fragmented with 0.6 units of DNAse I per tube for 10 minutes at 37° C. The efficacy of fragmentation was verified using the 2100 bioanalyzer (Agilent). The fragmented cDNA was then labeled with biotin using 330 units of terminal transferase (Roche Molecular Biochemicals, Meylan, France) and 1 μl of DNA Labeling Reagent (DLR-1a, 5 mM) [Affymetrix] per microgram of cDNA for 60 min at 37° C.
The entirety of the fragmented and labeled cDNA was finally hybridized on the custom DNA chip (called <<A520138F>>, cf example 6) following a standard hybridization protocol adapted for 11 μm chips.
8.1. Evidencing an Expression Profile of Transcripts Enabling Discrimination Between Control Patients (S) and Patients Suffering from a Stage I/II Cancer
The expression of around 2000 variants of RNA, representing around 800 genes, was analyzed and compared between S patients and C I/II patients. For this purpose, 16 μg of fragmented cDNA derived from each sample were added to a hybridization buffer (Affymetrix) and 200 μl of this solution was contacted for 16 h at 50° C. on expression chips. To record the best hybridization and washing performance levels, biotinylated RNAs qualified as <<controls >> (bioB, bioC, bioD and cre) and oligonucleotides (oligo B2) were also included in the hybridization buffer. After the hybridization step, the biotinylated cDNAs hybridized on the chip were developed through the use of a streptavidin-phycoerythrin solution and the signal was amplified using anti-streptavidin antibodies. Hybridization was conducted in a <<GeneChip Hybridisation oven >> (Affymetrix), and the Affymetric protocol followed was the Euk GE-WS2 protocol. The washing and development steps were conducted on a <<Fluidics Station 450>> (Affymetrix). Each chip was then analyzed under an Affymetrix G3000 GeneArray Scanner at a resolution of 1.5 microns to identify the hybridized areas on the chip. This scanner enables detection of the signal emitted by fluorescent molecules after excitation by argon laser using the epifluorescence microscope technique. Therefore, for each position, a signal is obtained that is proportional to the quantity of fixed cDNA. The signal was then analyzed using GeneChip Operating Software (GCOS 1.2, Affymetrix).
To provide against the variations obtained though the use of different chips, a normalization approach was followed using the <<Bioconductor >> tool which harmonizes the mean distribution of raw data obtained for each chip. The results obtained on one chip can then be compared with the results obtained on another chip. With the GCOS 1.2 software it was also possible to include a statistical algorithm to determine whether a transcript was or was not expressed.
From the 6,242 groups of probes of the chip, representing around 2,000 transcripts, the inventors selected the relevant transcripts which were correlated with the development of a breast cancer. Those transcripts whose level of expression on the majority of chips was too low and those transcripts which did not show any substantial variation between the different chips, were excluded. (Li et al, 2001, Bioinformatics, 17: 1131-1142). The search for a panel of transcripts discriminating between groups of EFS and CI/II patients was conducted using a Data Mining technique called <<random forest algorithm >> (http://ligarto.org/rdiaz/Papers/jornadas.bioinfo.randomForest.pdf). In addition to data analysis using the random forest algorithm, which is an analysis of multivariate type, a so-called <<univariate >> analysis was also used to identify those transcripts differentially expressed between EFS and CI/II patients. This analysis called SAM (<<Significance Analysis of Microarrays >>) is chiefly based on a modified version of Student's t test providing against the bias introduced by genes with low variability. With this approach, it is possible to control the percentage of false positive genes in a univariate analysis.
With all the above-mentioned analyses, it was possible to evidence a first panel of transcripts comprising 318 relevant transcripts according to the invention (cf. Table 1, SEQ ID Nos: 1-318). The increase or decrease in the expression of each of the transcripts observed in healthy patients (S) compared with C I/II patients is given in Table 1.
The inventors have examined the simultaneous expression of 318 transcripts to obtain an expression profile. Using the random forest method, 90% of patients were correctly classified. More precisely, 32 of the 37 controls and 51 of the 55 patients were correctly classified, which corresponds to a sensitivity of 92.7% and a specificity of 86.4%. In addition to the analysis on the 92 initial samples, an additional analysis was conducted to validate the relevance of the above-identified signature: an independent cohort of five healthy controls and 16 stage I/II breast cancer patients underwent a blind study. The analysis of an independent cohort is one of the best ways to test the predictive value of a signature of genes or transcripts (cf. The SMRS working group, Nat Biotech 2005, 7: p. 833-838). Based on sequences SEQ ID Nos: 1-318, the random forest algorithm correctly classified five controls out of five and 13 patients out of sixteen (86% classification).
Through additional hybridization and analysis experiments, in 188 patients, 119 additional targets could be identified corresponding to sequences SEQ ID NO: 319-437 (cf. Table 1).
The inventors also studied the simultaneous expression of 100 transcripts of nucleotide sequences chosen from among the sequences given in Table 2 to obtain an expression profile. Using the random forest method, 89% of patients were correctly classified. More precisely, 31 out of the 37 controls and 51 out of the 55 patients were correctly classified which corresponds to a sensitivity of 92.7% and a specificity of 83.7%. These results were confirmed by another analysis technique, the hierarchical cluster technique. In this non-supervised analysis, one Control positions itself among the patients (on the left of the red dotted line, cf.
In addition to the analysis on the 92 initial samples, an additional analysis was conducted to validate the relevance of the above-identified signature: a blind study was conducted in an independent cohort of five controls and 16 stage I/II breast cancer patients. Based on the top 100, the random forest algorithm correctly classified five controls out of five and 14 patients out of sixteen (90% classification).
Amongst this combination of 100 marker genes, the inventors evidenced that smaller panels also enabled a discrimination to be made between control patients and breast cancer patients as described in the following examples.
The inventors have evidenced a combination of 66 markers, based on sequences SEQ ID Nos: 5, 7, 13, 14, 16-20, 23-28, 47, 51-55, 58, 64, 69, 80, 81, 88-90, 116, 121, 125, 137, 139, 145, 148, 158, 160, 161, 164, 188-191, 208, 222, 225, 228, 229, 236, 240, 242, 245, 248, 252, 280, 281, 284, 290, 298-300 and 309-312 (see Table 3). With this combination it is possible to classify correctly more than 80% of samples.
The inventors have also evidenced a combination of 53 markers, based on sequences SEQ ID Nos: 7, 13, 14, 16-19, 23-28, 47, 51-55, 58, 69, 80, 81, 89, 116, 121, 125, 137, 139, 145, 148, 158, 160, 161, 164, 189, 190, 225, 228, 229, 240, 245, 248, 252, 280, 281, 284, 290, 298-300, 310 and 312. With this combination it is also possible to classify correctly more than 80% of samples.
The inventors have also evidenced a combination of 42 markers, based on sequences SEQ ID Nos: 13, 16-19, 23, 26-28, 47, 51-55, 58, 69, 80, 81, 89, 116, 121, 125, 145, 148, 158, 160, 161, 164, 189, 190, 225, 229, 240, 248, 280, 281, 284, 299, 300, 310 and 312. With this combination it is also possible to classify correctly more than 80% of samples.
The inventors have evidenced a combination of 22 markers, based on sequences SEQ ID Nos: 18, 19, 23, 26, 27, 51, 52, 53, 54, 55, 69, 80, 125, 145, 148, 161, 188, 225, 228, 240, 280 and 312. With this combination it is also possible to classify correctly 76% of samples.
The inventors have also evidenced a combination of 18 markers based on target sequences SEQ ID NOs: 18, 19, 23, 26, 51, 52, 53, 54, 55, 69, 80, 125, 145, 148, 225, 228, 240 and 312 given in Table 3. With this combination it is possible to classify correctly 76% of samples.
This confirms that analysis of the expression of these 18 markers is a good tool to discriminate between patients carrying a high risk of relapse and those with a low risk of relapse. The use of a restricted panel of genes is particularly suitable to obtain a detection and prognosis tool. Analysis of the expression of a dozen markers does not require the custom fabrication of a DNA chip, and can be implemented directly using PCR or NASBA techniques, or a low density chip, which is a considerable economic advantage with simplified implementation.
The inventors have evidenced combinations of 100, 104 and 110 markers, based on the target sequences SEQ ID Nos: 1-437, allowing detection of the presence of a breast cancer in individuals. These panels are described in Table 4. Panel 1 comprises all the sequences common to Panels 7-9.
Through additional hybridization and analysis experiments, conducted in 188 patients, the inventors have evidenced a combination of 90 markers, based on sequences SEQ ID Nos: 11; 12; 18; 23; 51 to 53; 59; 60; 105; 148; 150; 191; 195; 206; 225 a 227; 229; 280; 281; 308 to 310; 312; 327; 343; 360; 367; 377 to 437 (see Table 4).
With this combination, it is possible to classify correctly 86.1% of samples of early stage breast cancers. The genes in this panel are specific to early stage breast cancers. In additional tests, 19 blood samples taken from women suffering from colon cancer and 20 blood samples from women suffering from metastatic breast cancer were analyzed on custom chips such as described above. For each of these two diseases, only 3 patients showed a similar expression of the 90 markers included in Panel 10. Therefore, 84.2% (16 out of 19) of colon cancers and 85% (17 out of 20) metastatic breast cancers were not confused with early stage breast cancer.
Panel 11 comprises all the sequences common to all Panels 1 to 10 i.e. the nucleic acids comprising the sequences indicated in SEQ ID No: 23, 52, 53, 148 et 225 (see Table 4).
Analysis of the different genes identified in the invention, having altered expression in patients, shows that they belong to families of genes involved in cell signaling pathways or in common regulating mechanisms. In particular this analysis shows numerous genes involved in the signaling cascades used in immune response (innate or acquired), and notably in the transduction of messages initiated by stimulation of TLRs, in the secretion of cytokins or in lymphocyte-T activation.
The applicants have therefore defined panels of genes related to or taking part in signaling pathways of immune response, which form targets of interest for cancer detection. These panels are given in Tables 5 and 6.
cerevisiae)
cerevisiae)
cerevisiae)
cerevisiae)
cerevisiae)
cerevisiae)
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens vesicle
Homo sapiens
Homo sapiens vesicle
Homo sapiens Bruton agammaglobulinemia tyrosine
Homo sapiens phospholipase C, gamma 2
Homo sapiens spleen tyrosine kinase (SYK), mRNA.
Homo sapiens CD79A antigen (immunoglobulin-
Homo sapiens protein tyrosine phosphatase, non-
Homo sapiens mRNA for Fc fragment of IgG, low
Homo sapiens leucocyte immunoglobulin-like
Homo sapiens NKG2E gene. April 2005
Homo sapiens lysozyme (renal amyloidosis) (LYZ),
Homo sapiens platelet/endothelial cell
Homo sapiens Fc fragment of IgG, high affinity la,
Homo sapiens Fc fragment of IgG, low affinity IIIb,
Homo sapiens CD4 antigen (p55) (CD4), mRNA.
Homo sapiens interferon (alpha, beta and omega)
Homo sapiens tumor necrosis factor receptor
Homo sapiens major histocompatibility complex, class
Homo sapiens 2′-5′-oligoadenylate synthetase 2,
Homo sapiens TAP binding protein (tapasin)
Homo sapiens TYRO protein tyrosine kinase binding
Homo sapiens major histocompatibility complex, class
Homo sapiens killer cell lectin-like receptor subfamily
Homo sapiens leukocyte immunoglobulin-like
Homo sapiens granulysin (GNLY), transcript variant
Homo sapiens Fc fragment of IgG, low affinity IIa,
Homo sapiens major histocompatibility complex, class
Homo sapiens sialic acid binding Ig-like lectin 10
Homo sapiens immunoglobulin lambda-like
Homo sapiens interleukin 28 receptor, alpha
Homo sapiens CD300 antigen like family member B
Homo sapiens leukocyte immunoglobulin-like
Homo sapiens Notch homolog 2 (Drosophila) N-
H. sapiens FALL-39 gene. September 2004
Homo sapiens colony stimulating factor 2 receptor,
Homo sapiens integrin, beta 2 (antigen CD18 (p95),
Homo sapiens CD53 antigen (CD53), mRNA.
Homo sapiens chemokine (C-C motif) receptor 7
Homo sapiens chemokine (C-C motif) ligand 3
Homo sapiens C-type lectin domain family 4,
Homo sapiens ST6 beta-galactosamide alpha-2,6-
Homo sapiens mRNA for Neutrophil cytosol factor 2
Homo sapiens mRNA for colony stimulating factor 3
Homo sapiens BAC clone RP11-332L16 from 7,
Homo sapiens cDNA FLJ46012 fis, clone
Homo sapiens tumor necrosis factor receptor
Homo sapiens interleukin 2 receptor, gamma (severe
Homo sapiens mRNA for H-2K binding factor-2,
Homo sapiens class II, major histocompatibility
Homo sapiens serpin peptidase inhibitor, clade A
Homo sapiens transporter 2, ATP-binding cassette,
Homo sapiens CD14 antigen (CD14), mRNA.
Homo sapiens transporter 1, ATP-binding cassette,
Homo sapiens CD59 antigen p18-20 (antigen
Homo sapiens interleukin 8 receptor, alpha (IL8RA),
Homo sapiens transforming growth factor, beta 1
Homo sapiens interferon-induced protein with
Homo sapiens chemokine (C—X—C motif) receptor 4
Homo sapiens lymphocyte-specific protein 1 (LSP1),
Homo sapiens nuclear receptor subfamily 3, group C,
Homo sapiens annexin A11 (ANXA11), transcript
Homo sapiens Rho GDP dissociation inhibitor (GDI)
Homo sapiens E74-like factor 4 (ets domain
Homo sapiens FYN binding protein (FYB-120/130)
Homo sapiens interleukin 8 receptor, beta (IL8RB),
Homo sapiens arachidonate 5-lipoxygenase-
Homo sapiens acyloxyacyl hydrolase (neutrophil)
Homo sapiens cyclin D3 (CCND3), mRNA. October 2005
Homo sapiens CD97 antigen (CD97), transcript
Homo sapiens ferritin, heavy polypeptide 1 (FTH1),
Homo sapiens inositol 1,4,5-trisphosphate 3-kinase
Homo sapiens myeloid differentiation primary
Homo sapiens mitogen-activated protein kinase
Homo sapiens chemokine (C-C motif) ligand 5
Homo sapiens Sp2 transcription factor (SP2), mRNA.
Homo sapiens nuclear factor (erythroid-derived 2)-
Homo sapiens zinc finger protein 36, C3H type,
Homo sapiens tumor necrosis factor receptor
Homo sapiens interleukin 18 receptor accessory
Homo sapiens guanylate binding protein 2,
Homo sapiens CD81 antigen (target of
Homo sapiens N-myc (and STAT) interactor (NMI),
Homo sapiens S100 calcium binding protein A12
Homo sapiens B-cell receptor-associated protein 31
Homo sapiens immunoglobulin superfamily, member
Homo sapiens SMAD, mothers against DPP homolog
Homo sapiens proteasome (prosome, macropain)
Homo sapiens heparanase (HPSE), mRNA. November 2005
Homo sapiens complement component 1, q
Homo sapiens px19-like protein (PX19), mRNA.
Homo sapiens linker for activation of T cells family,
Homo sapiens aquaporin 9 (AQP9), mRNA. October 2005
Homo sapiens toll-like receptor adaptor molecule 2
Homo sapiens zinc finger protein 3 (A8-51) (ZNF3),
Homo sapiens guanylate binding protein 5 (GBP5),
Homo sapiens titin (TTN), transcript variant novex-2,
Homo sapiens apolipoprotein L, 3 (APOL3), transcript
Homo sapiens proteasome (prosome, macropain)
Homo sapiens E74-like factor 1 (ets domain
Homo sapiens myxovirus (influenza virus) resistance
Homo sapiens toll-like receptor 2 (TLR2), mRNA.
Homo sapiens dedicator of cytokinesis 2 (DOCK2),
Homo sapiens WNK lysine deficient protein kinase 1,
Homo sapiens CD3E antigen, epsilon polypeptide
Homo sapiens inhibitor of kappa light polypeptide
Homo sapiens calmodulin 2 (phosphorylase kinase,
Homo sapiens major histocompatibility complex, class
Homo sapiens p21/Cdc42/Rac1-activated kinase 1
Homo sapiens phosphoinositide-3-kinase, class 3
Homo sapiens mitogen-activated protein kinase 1
Homo sapiens ras-related C3 botulinum toxin
Homo sapiens nuclear factor of kappa light
Homo sapiens beta-2-microglobulin (B2M), mRNA.
Homo sapiens c-src tyrosine kinase (CSK), mRNA.
Homo sapiens phosphoinositide-3-kinase, catalytic,
Homo sapiens v-fos FBJ murine osteosarcoma viral
Homo sapiens vav 1 oncogene (VAV1), mRNA.
Homo sapiens lymphocyte cytosolic protein 2 (SH2
Homo sapiens CD28 antigen (Tp44) (CD28), mRNA.
Homo sapiens protein kinase C, theta (PRKCQ),
Homo sapiens major histocompatibility complex, class
Homo sapiens nuclear factor of kappa light
Homo sapiens major histocompatibility complex, class
Homo sapiens phospholipase C, gamma 1 (PLCG1),
Homo sapiens mRNA for HLA class I
Homo sapiens TNF-receptor associated factor-3
Homo sapiens T cell receptor alpha variable 20,
Homo sapiens Janus kinase 3 (a protein tyrosine
Homo sapiens T-cell receptor active beta-chain V-
Homo sapiens T-cell receptor alpha chain (TCRA)
Homo sapiens interferon gamma receptor 1
Homo sapiens neutrophil cytosolic factor 2 (65 kDa,
Homo sapiens zyxin (ZYX), transcript variant 2,
Homo sapiens interleukin 32 (IL32), transcript variant
Homo sapiens interleukin 10 receptor, alpha
Homo sapiens Janus kinase 1 (a protein tyrosine
Homo sapiens lymphocyte cytosolic protein 1 (L-
Homo sapiens regulator of G-protein signalling 2,
Homo sapiens suppressor of cytokine signaling 2
Homo sapiens FK506 binding protein 5 (FKBP5),
Homo sapiens G protein pathway suppressor 2
Homo sapiens TNFAIP3 interacting protein 1 (TNIP1),
Homo sapiens CDC42 effector protein (Rho GTPase
Homo sapiens LPS-responsive vesicle trafficking,
Homo sapiens CDC42 effector protein (Rho GTPase
Homo sapiens jumonji domain containing 2A
Homo sapiens jumonji domain containing 2B
Homo sapiens FK506 binding protein 1A, 12 kDa
Homo sapiens colony stimulating factor 3 receptor
Homo sapiens high density lipoprotein binding protein
Number | Date | Country | Kind |
---|---|---|---|
0511080 | Oct 2005 | FR | national |
0602824 | Mar 2006 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2006/051108 | 10/26/2006 | WO | 00 | 5/7/2009 |