Novel genetic markers for leukemias

The present invention is related to methods for detecting leukemia cells by determining the expression profile of a group of markers. In particular, the type or subtype of leukemia cells in a sample is determined. Further, uses of the group of markers are disclosed and compositions comprising these markers.

In the present specification, a number of documents is cited. The disclosure content of these documents including manufacturers' manuals, is herewith incorporated by reference. This holds particular true for the documents such as gene accession numbers cited in Tables 43a, b, 44 and 45 providing the complete nucleotide sequence of marker genes/cDNAs. In other terms, by reciting these documents, applicant intends to incorporate the complete nucleotide/amino acid sequence of those markers where only a partial sequence has been identified in the appended Tables. It is also intended to include the (poly)peptide sequences translated from these nucleotide sequences within the disclosure content of the present specification.

Today leukemias are classified into four different groups or types: acute myeloid (AML), acute lymphatic (ALL), chronic myeloid (CML) and chronic lymphatic leukemia (CLL). Within these groups, several subcategories can be identified further using a panel of standard techniques as described below. The incidence of leukemias is increasing with age and is 5/100.000/year in AML, 1/100.000/year in ALL, 1/100.000 in CML and 6/100.000/year in CLL. Several methods for classification have to be applied at diagnosis and before treatment starts: cytomorphology and cytochemistry, multiparameter-immunophenotyping, cytogenetics including fluorescence in situ hybridization, and molecular techniques such as polymerase chain reaction (PCR). So far only a combination of these techniques allows a precise diagnosis which is necessary to apply state of the art treatment. As the exact diagnosis is mandatory for example in CML the detection of a specific cytogenetic abnormality, the translocation (9;22) or its molecular counterpart, the BCR/ABL rearrangement is required to establish the diagnosis of CML. While all patients with CML show a BCR-ABL-rearrangement and are therefore homogenous with regard to the primary genetic abnormality, in AML and

ALL at least 10-15 different subgroups have been identified on the morphological, genetical or molecular level. Also in CLL several subgroups can be clearly separated. These different subcategories in leukemias are associated with varying clinical outcome and therefore are the basis for different treatment strategies. The importance of highly specific classification may be illustrated in detail further for the AML as a very heterogeneous group of diseases.

Data from clinical trials showed that outcome of patients with AML differs in a broad range. Several parameters influencing prognosis have been identified. These can be assigned to different categories: patients' characteristics (i.e. age, comorbidity), therapy, and biology of the AML. Therefore, a lot of effort was invested to identify biological entities and to distinguish subgroups of AML which are associated with a favorable, intermediate or unfavorable prognosis, respectively. In order to allow a comparison between different studies a classification of AML was mandatory. In 1976 the FAB classification was proposed by the French-American-British co-operative group which was based on cytomorphology and cytochemistry in order to separate AML subgroups according to the morphological appearance of blasts in the blood and bone marrow. In addition, it was recognized that genetic abnormalities occurring in the leukemic blast had a major impact on the morphological picture and even more on the prognosis. So far, the karyotype of the leukemic blasts is the most important independent prognostic factor regarding response to therapy as well as survival. For clinical purposes karyotype analysis allows to discriminate between three major prognostic groups. A favorable outcome under currently used treatment regimens with cure rates from 50% up to 858 was observed in several studies in patients with a) t (8;21) (q22; q22) occurring in AML M2, b) inv (16) (p13q22) occurring in; AML M4eo and c) t(15;17) (q22; qll-12) occurring in AML M3/H3v. In contrast, chromosome aberrations with an unfavorable clinical course are −5/del(5q), −7/del(7q), inv(3)/t(3:31 and complex aberrant karyotypes with cure rates of only 10%. The remainder of AML patients are assigned to a prognostically intermediate group. This latter group is very heterogeneous because it includes patients with a normal karyotype as well as those with rare chromosome aberrations with yet unknown prognostic impact.

The sub-classification of leukemias becomes Increasingly important to guide therapy. Thus, the development of new, specific treatment approaches requires the identification of specific subtypes that may benefit from a distinct therapeutic protocol. It has already been shown in two entities that the development of specific drugs can improve outcome of distinct subsets of leukemia. One important example is the development of a new therapeutic drug (ST1571) for the treatment of chronic myeloid leukemia (ML): this designed molecule inhibits the CML specific chimeric tyrosine kinase BCR-ABL generated from the genetic defect observed in CML, the BCR-ABL rearrangement due to the translocation between chromosomes 3 and 22 (t(9;22) (q34; q11)). First data show that therapy response is dramatically higher in patients treated with this new drug as compared to all other drugs that had been used so far. Another example is the subtype of acute myeloid leukemia AML M3 and its variant M3v both with karyotype t[15;17) (q22; q11-12). The introduction of a new drug (all-trans retinoic acid—ATRA) has improved the outcome in this subgroup of patient from about 50% to 85% long-term survivors; As it is mandatory for these patients suffering from these specific leukemia subtypes to be identified as fast as possible so that the best therapy can be applied, diagnostics today must accomplish sub-classification with maximal precision. Not only for these subtypes but also for several other leukemia subtypes different treatment approaches could improve outcome. Therefore, rapid and precise identification of distinct leukemia subtypes is the future goal for diagnostics.

So far a combination of methods is necessary to obtain the most important information in leukemia diagnostics: Analysis of the morphology and cytochemistry of bone marrow blasts and peripheral blood cells is necessary to establish the diagnosis. In some cases the addition of immunophenotyping is mandatory to separate very undifferentiated AML from acute lymphoblastic leukemia and CLL. Leukemia subtypes investigated can be diagnosed by cytomorphology alone, only if an expert reviews the smears. However, a genetic analysis based on chromosome analysis, fluorescence in situ hybridization or RT-PCR and immunophenotyping is required in order to assign all cases in to the right category. The aim of these techniques besides diagnosis is mainly to determine the prognosis of the leukemia. A major disadvantage of these methods, however, is that viable cells are necessary as the cells for genetic analysis have to divide in vitro in order to obtain metaphases for the analysis. Another problem is the long time of 72 hours from receipt of the material in the laboratory to obtain the result. Furthermore, great experience in preparation of chromosomes and even more in analyzing the karyotypes is required to obtain the correct result in at least 90% of cases. These experts in their field are necessary for all other techniques mentioned above as well. Accordingly, standard diagnosis of leukemia uses a combination of complementary methods, is expensive, time-consuming, and requires experienced experts in the field. Methods that have to be combined are cytomorphology or histomorphology, multiparameter-immunophenotyping, cytogenetics, fluorescence in situ hybridization, and molecular genetics such as polymerase chain reaction based assays.

Using these techniques in combination, hematological malignancies in a first approach are separated into chronic myeloid leukemia (CML), chronic lymphoid (CLL), acute lymphoblastic (ALL), and acute myeloid leukemia (AML). Within the latter three disease entities several prognostically relevant subtypes have been established. As a second approach this further subclassification is based mainly on genetic abnormalities of the leukemic blasts and clearly is associated with different prognoses. Therefore, this subclassification is increasingly important to guide therapy. Furthermore, the development of new, specific treatment approaches requires precise identification of leukemia subtypes.

In a first study Golub et al. (Science 1999) showed that gene expression profiles can be used for class prediction and discriminated AML from ALL samples. However, for his analysis of acute leukemias the selection of the two different subgroups was performed using exclusively morphologic-phenotypical criteria. This was only descriptive and does not provide deeper insights into the pathogenesis or the underlying biology of the leukemia. The approach reproduces only very basic knowledge of cytomorphology and intends to differentiate classes. The data is not sufficient to predict prognostically relevant cytogenetic aberrations.

Thus, the technical problem underlying the present invention was to provide means for leukemia diagnostics which overcome the disadvantages of the prior art diagnostic methods.

The solution to said technical problem is achieved by providing the embodiments characterized in the claims. Accordingly, the present invention relates to a method of determining whether a patient sample contains leukemia cells or other cells comprising the steps of a) determining the expression profile of a group of markers in a patient sample and b) concluding from the expression profile whether the patient sample contains leukemia cells or other cells characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 3 to 6, tables 15 to 20, tables 29, 30, 41, or 42 and whereby the number of markers in the group is between one and the total number of markers listed in the tables 3 to 6, tables 15 to 20, and tables 29, 30, 41, or 42. In a particular embodiment thereof, the present invention pertains to a method wherein leukemia type and subtype are simultaneously determined whereby a microarray for the detection of the expression level of a marker or a group of markers is used.

It is important to note that in accordance with the invention in all pertaining embodiments any possible combination of markers, said markers being disclosed in the respective table or tables is encompassed within the scope of the invention.

As used herein, the term “expression” refers to the process by which mRNA or a polypeptide is produced based on the nucleic acid sequence of a gene. The process includes both transcription and translation, i.e. “expression” shall also include the formation of mRNA upon transcription.

In accordance with the present invention, the term “determining the expression profile” preferably refers to the determination of the level of expression, namely of said group of markers.

As used herein, the term “marker” refers to a DNA, in particular cDNA, or RNA or a fragment thereof or a protein or a fragment thereof which are in the case of RNA (or cDNA) formed upon transcription of a nucleotide sequence which is capable of expression. The nucleic acid molecule fragments refer to fragments preferably of at least 8 such as ten, twelve, fifteen or eighteen nucleotides in length representing a consecutive stretch of nucleotides of a gene, cDNA or mRNA such as of 20 or nucleotides that are, for example, further specified in the appended Tables or a complementary sequence thereto. In other terms, markers include any fragment (or complementary sequence thereto) of the sequences depicted in the appended tables as long as these fragments unambiguously identify the marker. Typical fragment lengths are provided above. The determination of the expression profile of markers may be effected at the transcriptional or translational level. In other terms, the method of the invention envisages the determination at the level of mRNA or at the protein level. Protein fragments such as peptides advantageously comprise at least 6 consecutive amino acids representative of the corresponding full length protein. 6 amino acids are generally recognized as the lowest peptidic stretch giving rise to a linear epitope recognized by an antibody, fragment or derivative thereof. Alternatively, the proteins or fragments thereof may be analysed using nucleic acid molecules specifically binding to three-dimensional structures (aptamers). In principle, the investigator may determine, in accordance with the method of the invention, whether a gene is expressed at all in a leukemic or other cell. Alternatively, an investigator may determine the difference in the expression level, for example, between a leukemic and a non-leukemic cell or between two or more different types or subtypes of leukemia. If the sample comprises only other, i.e. non-leukemia cells, then the patient's suffering from a leukaemia may safely be denied. Insofar, the above main embodiment is to be understood that if the presence of other cells is determined then this determination includes an assessment to the effect that only other cells but no leukemic cells are comprised in the sample. On the other hand, the determination of leukemic cells may include the further characterization of such cells including the differentiation status of the cells as well as the distinction from other types of cancer cells or other subtypes of leukaemia cells. Particular embodiments in this regard are further outlined herein below.

In accordance with the above, the present invention also contemplates methods where simply the assessment of leukaemia cells but not necessarily of other cells is effected. This holds true for all embodiments where the determination of other cells is mentioned. It is to be understood that with the exception of the possible determination of other cells, the steps of the various methods of the invention remain unchanged. Thus, the invention also relates to a method of determining whether a patient sample contains leukemia cells comprising the steps of a) determining the expression profile of a group of markers in a patient sample and b) concluding from expression profile whether the patient sample contains leukemia cells characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 3 to 6, tables 15 to 20, tables 29, 30, 41, or 42 and whereby the number of markers in the group is between one and the total number of markers listed in the tables 3 to 6, tables 15 to 20, and tables 29, 30, 41, or 42. Thus, the invention further relates to a method of determining whether a patient sample contains leukemia cells and at the same time or subsequently determining the type and subtype of leukemia cells, if leukemia cells are present, comprising the steps of a) determining the expression profile of a group of markers in a patient sample and b) concluding from the expression profile whether the patient sample contains leukemia cells and at the same time or subsequently determining the type and subtype of leukemia cells, if leukemia cells are present, characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 16 to 20 or table 29 or 30 and whereby the number of markers in the group is between one and the total number of markers listed in the tables 16 to 20 or table 29 or 30, to name two important embodiments of the invention.

Determination of the expression profile/levels may be effected by a variety of methods, depending on the nature of the marker. Thus, if the marker is mRNA, cDNA may be prepared into which a detectable label, such as a fluorescent, chemiluminescent, bioluminescent, radioactive (such as ³H or ³²P) label is incorporated. Said detectably labelled cDNA, in single-stranded form, may then be hybridised, preferably under stringent or highly stringent conditions to a panel of single-stranded oligonucleotides representing different genes and affixed to a solid support such as a chip. Upon applying appropriate washing steps, those cDNAs will be detected or quantitatively detected that have a counterpart in the oligonucleotide panel. Various advantageous embodiments of this general method are feasible. For example, the mRNA or the cDNA may be amplified wherein it is, for quantitative assessments, preferable that the number of amplified copies corresponds relative to further amplified mRNAs or cDNAs to the number of, mRNAs originally present in the cell. Also, the cDNAs may be transcribed into cRNAs wherein only in the transcription step a label is incorporated into the nucleic acid and wherein the cRNA is employed for hybridisation. Alternatively, the table may be attached subsequent to the transcription step. Similarly, proteins from a cell or tissue under investigation may be contacted with a panel of aptamers or of antibodies or fragments or derivatives thereof. The antibodies etc. may be affixed to a solid support such as a chip. Binding of proteins indicative of a leukemia or a subtype of leukaemia may be verified by binding to a detectably labelled secondary antibody or aptamer. For the labelling of antibodies, it is referred to Harlow and Lane, “Antibodies, a laboratory manual”, CSH Press, 1988, Cold Spring Harbor. As regards further test assays and formats, it is referred to further embodiments of the invention as specified herein below as well as to the appended examples. In addition, a number of applicable assay formats are available in the art that can applied to the method of the invention without further ado. Specifically, a minimum set of proteins necessary for diagnosis of all leukemia types may be selected for creation of a protein array system to make diagnosis on a protein lysate of a diagnostic bone marrow sample directly. Protein Array Systems for the detection of specific protein expression profiles already are available (for example: Bio-Plex, BIORAD, München, Germany). For this application preferably antibodies against the proteins have to be produced and immobilized on a platform e.g. glasslides or microtiterplates. The immobilized antibodies can be labeled with a reactant specific for the certain target proteins as discussed above. The reactants can include enzyme substrates, DNA, receptors, antigens or antibodies to create for example a capture sandwich immunoassay.

The level of the expression of the “marker” is indicative of a leukemic condition, of a cell or an organism. The level of expression of a marker or group of markers is measured and is compared with the level of expression of the same marker or the same group of markers from other cells or samples. The comparison may be effected in an actual experiment or in silico. When the expression level also referred to as expression pattern or expression signature (expression profile) is measurably different, there is according to the invention a meaningful difference in the level of expression. Preferably the difference at least is 5%, 10% or 20%, more preferred at least 50% or may even be as high as 75% or 100%. More preferred the difference in the level of expression is at least 200%, i.e. two fold, at least 500%, i.e. five fold, or at least 1000%, i.e. 10 fold.

The present invention allows to diagnose a wide variety and at least 14 different clinically relevant leukemia subtypes. Therefore, the invention of a combination of marker genes and their specific expression level it is possible to substitute all other mandatory diagnostic approaches including the approach of Golub and colleagues (cytomorphology or histomorphology, multiparameter-immunophenotyping, cytogenetics, fluorescence in situ hybridization, and molecular genetics) in one single step with a specificity and sensitivity that had never been achieved in all other techniques used so far.

In more detail, based on biomathematical analysis of gene expression profiles a new method could be provided which forms the basis for designing and developing a novel diagnostic approach preferably based on microarray technology. Further, subsets of markers, preferably genes could be introduced which allow the determination of leukemia type and subtype. The method according to the invention abolishes today's standard procedures in diagnosis of leukemia. These standard diagnostic procedures require more and more centralized core facilities with both personal experts in the fields of cytomorphology, cytogenetics and molecular genetics and expensive lab equipment, which causes increasing costs for adequate diagnosis. The present invention provides novel cost-effective methods and diagnostic tools, which are less time consuming, easy to operate but nevertheless as accurate and safe as all standard methods combined today. The genes or sets of genes allows to assign clinical samples either as healthy or malignant simply based on their gene expression profiles. The genes, representative fragments thereof or transcription or translation products thereof form the basis for the methods of the invention or diagnostic tools, corresponding thereto. Furthermore, these genes etc. allow to predict the diagnoses based on the genetic abnormality of the expression pattern and to discriminate between different prognostic relevant entities. When comparing two groups of microarray experiments, Golub's method (Science 286 (1999), 531-537) sorts the genes with respect to the signal-to-noise ratio of gene x: S_x=(μ₁-μ₂)/(σ₁+σ₂), where μ_kand σ_kdenote the mean expression and standard deviation of gene x in group k.

According to a specified number of “informative” genes the 20 best discriminating genes are selected. For each informative gene a decision limit is calculated as b_x=(μ₁+/μ₂)/2. To classify a new sample of an independent test set, the gene expression levels of informative genes are taken and for each gene x and sample y a so-called vote is calculated as V_x=S_x(g_x^y−b_x), where g_x^ydenotes expression level of gene x in sample y. The votes of all informative genes are summed up (“weighted voting”) and depending upon the sign of this sum the new sample is classified as group 1 or group. 2. The confidence in the prediction is calculated as |ΣV_x/Σ|V_x||.

To assess the significance of each gene, a permutation test is performed, which determines signal-to-noise ratios when class labels are permuted randomly.

To assess the robustness of the classifier, a leave-one-out crossvalidation is performed. Accuracy is the rate of correctly classified test samples.

The decision limit proposed by Golub does not provide optimal classification accuracy in all situations. When the standard deviation of expression levels within the two groups are very different, the decision limit is biased towards the group with the higher standard deviation.

A decision limit for a particular gene can be considered optimal, if it achieves maximum classification accuracy for a given dataset. By determining systematically classification accuracies for a set of possible decision limits, an optimal decision limit can be calculated. The underlying statistics as described in Example 3 select an optimal decision limit from the following set of decision limits L_x:

L_x={(g_x^y+g_x^y−1)/2|1<y<=n}

where g_x^ydenotes expression level of gene x in sample y, n denotes the total number of samples in the training set.

Golubs method selects an arbitrary number of “informative” genes to discriminate between two classes of samples according to their signal-to-noise ratio, typically in the range of 10 to 50 genes.

Choosing too many genes like in Golub's method carries the risk of overfitting, which causes poor generalization features of the model.

Therefore the present invention applies an heuristic approach to select a minimal set of discriminative genes, which provides maximum classification accuracy in leave-one-out-crossvalidation. I.e. for a given set of genes weighted voting as described by Golub is applied and the classification accuracy is calculated by crossvalidation used in accordance with the present invention and representing a further embodiment in accordance with this invention.

The method for achieving this used in accordance with the present invention and representing a further embodiment in accordance with this invention consists of the following steps:

- (a) calculating of the top 20 discriminating genes according to the signal-to-noise ratio (top 20 SNR's);
- (b) calculating classification accuracy and confidence based on optimal decision limits for each of the top 20 genes;
- (c) selecting the gene which provides best classification accuracy and confidence out of step 2; and
- (d) testing for each of the remaining 19 genes, whether adding this gene to the model improves accuracy and confidence.

If the gene improves accuracy and confidence, it is added to the weighted voting model, otherwise it is discarded.

Preferably, the decision limit is set according to the formula recited above.

In a pilot study consisting of 103 Affymetrix Genechip microarrays with 12625 genes each as shown in the appended examples we compared the results achieved with Golub's method and with our extended method.

Table A presents an analysis of 18 samples class A versus 85 samples class non-A. Based on 20 informative genes Golub's method results in a crossvalidation accuracy of 0,87 (confidence 0,77); achieves with three genes out of the top 20 set a crossvalidation accuracy of 0,96 (confidence 0,88).

The same analysis was performed for one versus all (OVA) and all pairs (AP) comparisons in this dataset consisting of 5 different classes. FIG. 13b presents accuracy and confidence obtained by both methods: the method of the invention outperforms Golub's method clearly both in terms of accuracy and confidence of classifications.

The development of a leukemia diagnostic tool, preferably microarray based, allows for all patients which are preferably humans and specimens a reproducible, highly specific and rapid method to obtain important information for treatment strategies in leukemia. This technique can be established in every laboratory using basic methods of molecular biology, and preferably makes use of hybridization and amplification such as PCR or LCR based techniques and does not require hematologists or cytogeneticists with several years of experience in leukemia diagnostics. Material for the analysis can be sent over large distances as it is not necessary that cells arrive viable in the laboratory. Therefore, a centralization of leukemia diagnostics with very high quality is possible.

Moreover, the accumulation of an immense knowledge about gene expression profiles in leukemia types and subtypes, which are not characterized by specific genetic abnormalities, leads to a more precise classification compared to all other methods used so far. In addition, the data compiled in accordance with the invention are helpful for the understanding of the pathogenesis of leukemia and will allow to identify genes which are specifically dysregulated. They may be considered as potential targets for therapeutic interventions specifically designed for the different leukemia subtypes.

Preferably the method according to the invention is characterized in that the group of markers consists of between two, such as three, four, five, six, seven, eight, nine or ten and the total number of markers listed in one or more of the tables 3 to 6, tables 15 to 20, and tables 29, 30, 41, or 42. Most preferred, the group consists of all markers listed in one or more tables, whereby the tables are selected from the tables 3 to 6, tables 15 to 20, and tables 29, 30, 41, or 42. The invention also contemplates that all markers in all tables are analysed. This holds true for the presently discussed as well as for embodiments discussed further below.

Another embodiment of the invention relates to a method of determining whether a patient sample contains leukemia cells or other cells and at the same time or subsequently determining the type and subtype of leukemia cells, if leukemia cells are present, comprising the steps of determining the expression profile, preferably the level of expression of a group of markers in a patient sample and concluding from the (altered) expression profile i.e. the difference in the level of expression, whether the patient sample contains leukemia cells or other cells and at the same time determining the type and subtype of leukemia cells, if leukemia cells are present, characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 16 to 20 or table 29 or 30 and whereby the number of markers in the group is between one, preferably two such as three, four, five, six, seven, eight, nine or ten and the total number of markers listed in one or more of the tables 16 to 20 or table 29 or 30. It is preferred that the group of markers consists of all markers listed in one or more tables, whereby the tables are selected from the tables 16 to 20 or table 29 or 30. In a preferred embodiment it is differentiated between four types of leukemia cells and the other cells in the patient sample. The other cells are preferably normal cells.

The “other cells” may be, for example, cells affected by a disease which is not a leukaemia. It is preferred, in accordance with the present invention that said other cells are normal cells, i.e. cells not affected by any disease.

This embodiment of the present invention allows for the differentiation between four different types of leukemias, i.e. AML, CLL, CML and ALL. As has been surprisingly demonstrated in accordance with the present invention, the qualitative and/or quantitative determination of an expression profile of a number of genes text missing or illegible when filed the unambiguous classing with any of the above and currently established of leukemias. In principle and more preferred, the relation of the gene profile to the leukaemia type may take place at the same time at which determination of the leukaemia cells in the sample takes place. Alternatively, the classification may be effected at a later time point. It was surprising that the text missing or illegible when filed nction between the large number of leukemia types and subtypes, including cytogenetically and immunophenotypically defined, as well as types characterized by complex chromosomal aberations, could be accomplished preferably by the use of a microarray for the detection of the expression level of a marker or a group of markers with such ease and accuracy. In particular, certain preferred subsets of genes are provided which can either be used to determine the leukemia type and subtype, or only determine the subtypes of a certain leukemia type or differentiates certain types or subtypes, respectively, from one another.

In another embodiment a method is disclosed which allows differentiating between two types of leukemia cells or one type of leukemia cells and normal cells or non-leukemia cells in a patient sample comprising the steps of determining the expression profile preferably the level of expression, of a group of markers in the patient sample and concluding from the (altered) expression profile, i.e. the difference in the level of expression, which type of leukemia cells the patient sample contains or whether it contains (only) normal cells or non-leukemia cells characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 3 to 6 or tables 7 to 12 and whereby the number of markers in the group is between one, preferably two such as three, four, five, six, seven, eight, nine or ten and the total number of markers listed in one or more of the tables 3 to 6 or tables 7 to 12. In a preferred embodiment the group of markers consists of all markers listed in one or more of the tables 3 to 6 or tables 7 to 12.

In another embodiment of the invention a method is disclosed allowing the differentiation between the subtypes of AML cells or between the subtypes of AML cells and normal cells in a patient sample comprising the steps of determining the expression profile, preferably the level of expression of a group of markers in the patient sample and concluding from the (altered) expression profile, i.e. the difference in the level of expression, which subtypes of AML cells the patient sample contains or whether it contains normal cells characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1, 2, 13, 14, 17, 25, 27, 35 and 36 and whereby the number of markers in the group is between one, preferably two such as three, four, five, six, seven, eight, nine or ten and the total number of markers listed in one or more of the tables 1, 2, 13, 14, 17, 25, 27, 35 and 36. In a preferred embodiment the group of markers consists of all markers listed in one or more of the tables 1, 2, 13, 14, 17, 25, 27, 35 and 36. It is preferred that three, four or more subtypes of AML cells are determined.

In another embodiment of the invention a method is disclosed allowing the differentiation between and thus the determination of the subtypes of ALL cells in a patient sample comprising the steps of (a) determining the level of expression of a group of markers in the patient sample and (b) concluding from the differences in the level of expression which subtypes of ALL cells the patient sample contains whereby the group of markers consists of markers selected independently from the markers listed in one or more of the tables 18, 32 or 33 and whereby the number of markers in the group is between one, preferably two such as three, four, five, six, seven, eight, nine or ten and the total number of markers listed in one or more of the tables 18, 32 or 33. It is preferred that the group of markers consists of all markers listed in one or more of the tables 18, 32 or 33.

In another embodiment of the invention a method is disclosed allowing the differentiation between and thus the determination of the subtypes of CLL cells in a patient sample comprising the steps of determining the level of expression of a group of markers in the patient sample and concluding from the differences in the level of expression which subtypes of CLL cells the patient sample contains whereby the group of markers consists of markers selected independently from the markers listed in one or more of the tables 38 or 39 and whereby the number of markers in the group is between one, preferably two such as three, four, five, six, seven, eight, nine or ten and the total number of markers listed in one or more of the tables 38 or 39. It is preferred that the group of markers consists of all markers listed in one or more of the tables 38 or 39.

In another embodiment of the invention, a method is disclosed of assessing the efficacy of a test compound for inhibiting leukemia, the method comprising comparing the expression profile of a group of markers in a first sample obtained from the patient and maintained in the presence of the test compound and the expression profile of a group of markers in a second sample obtained from the patient and maintained in the absence of the test compound, wherein a significantly altered expression profile of the group of markers in the first sample, relative to the second sample, is an indication that the test compound is efficacious for inhibiting leukemia in the patient characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two such as 3, 4, 5, 6, 7, 8, 9 or 10 and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42.

In accordance with this embodiment of the present invention, it is again preferred that in the comparison of expression profiles expression levels and differences in expression levels are determined and compared. It is further preferred that the alteration determined in accordance with the method of the invention in the expression profile or expression level must be in the direction of the expression profile of normal cells or at least diseased but non-leukemic cells. More preferably the alteration should be in the direction of normal blood cells, more preferably cells of the certain type. Accordingly, it is also preferred that the comparison includes an internal standard of expression levels of analysed markers wherein the internal standard represents the expression profile of non-leukemic and preferably normal cells. The comparison may be effected by relying on actual experimental data or on in silico obtained reference data.

In another embodiment of the invention a method is disclosed of assessing the efficacy of a therapy for inhibiting leukemia in a patient, the method comprising comparing the expression profile, preferably the level of expression of a group of markers in the first sample obtained from the patient prior to providing at least a portion of the therapy to the patient and the expression profile, preferably the level of expression of a group of markers in a second sample obtained from the patient following provision of the portion of the therapy, wherein a significantly (altered) expression profile, i.e. a significantly (altered) difference in the level of expression of the group of markers in the second sample, relative to the first sample, is an indication that the therapy is efficacious for inhibiting leukemia in the patient characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two such as 3, 4, 5, 6, 7, 8, 9 or 10 and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, or 42.

As with the previous embodiment, the alteration determined in accordance with the method of the invention in the expression profile or expression level must be in the direction of the expression profile or normal cells or at least diseased but non-leukemic cells. Accordingly, it is also preferred in accordance with this embodiment that the comparison includes an internal standard of expression levels of analysed markers wherein the internal standard represents the expression profile of non-leukemic and preferably normal cells. The comparison may—again—be effected by relying on actual experimental data or on in silico obtained reference data.

Within the therapy of the patient, compounds may be administered that have at least passed phase II and preferably are within phase III of clinical trials. Advantageously, in one embodiment, a therapeutical composition or medicinal product is administered that comprises one pharmaceutically active compound. In alternative embodiments, pharmaceutical compositions or medicinal products are administered that comprise more than one pharmaceutically active compound. If the composition or product comprises more than at least one pharmaceutically active compound then one of the compounds may aim at the direct reduction of tumor load wherein at least one further compound may fulfil an accessory function such as the general stimulation of the immune system. Compounds of the latter class are also well known in the art and comprise plant derived products as well as immunostimulatory molecules selected from the group of interleukins, interferons and others.

Additionally, the invention contemplates a method of refining a compound identified by the method as described herein above, said method comprising optionally the steps of said methods and:

(1) identification of the binding sites of the compound and the target molecule by site-directed mutagenesis or chimeric protein studies;
(2) molecular modeling of both the binding site of the compound and the binding site of the target molecule; and
(3) modification of the compound to improve its binding specificity for the target.

The target may in accordance with the above be DNA, mRNA or protein. All techniques employed in the various steps of the method of the invention are conventional or can be derived by the person skilled in the art from conventional techniques without further ado. Thus, biological assays based on the herein identified nature of the proteins/(poly)peptides may be employed to assess the specificity or potency of the drugs wherein the increase of one or more activities of the proteins/(poly)peptides may be used to monitor said specificity or potency. Steps (1) and (2) can be carried out according to conventional protocols. A protocol for site directed mutagenesis is described in Ling M M, Robinson B H. (1997) Anal. Biochem. 254: 157-178. The use of homology modeling in conjunction with site-directed mutagenesis for analysis of structure-function relationships is reviewed in Szklarz and Halpert (1997) Life Sci. 61:2507-2520. Chimeric proteins are generated by ligation of the corresponding DNA fragments via a unique restriction site using the conventional cloning techniques described in Sambrook (1989), loc. cit. A fusion of two DNA fragments that results in a chimeric DNA fragment encoding a chimeric protein can also be generated using the gateway-system (Life technologies), a system that is based on DNA fusion by recombination. A prominent example of molecular modeling is the structure-based design of compounds binding to HIV reverse transcriptase that is reviewed in Mao, Sudbeck, Venkatachalam and Uckun (2000). Biochem. Pharmacol. 60: 1251-1265.

For example, identification of the binding site of said drug by site-directed mutagenesis and chimerical protein studies can be achieved by modifications in the (poly)peptide primary sequence that affect the drug affinity; this usually allows to precisely map the binding pocket for the drug.

As regards step (2), the following protocols may be envisaged: Once the effector site for drugs has been mapped, the precise residues interacting with different parts of the drug can be identified by combination of the information obtained from mutagenesis studies (step (1)) and computer simulations of the structure of the binding site provided that the precise three-dimensional structure of the drug is known (if not, it can be predicted by computational simulation). If said drug is itself a peptide, it can be also mutated to determine which residues interact with other residues in the (poly)peptide of interest.

Finally, in step (3) the drug can be modified to improve its binding affinity or ist potency and specificity. If, for instance, there are electrostatic interactions between a particular residue of the (poly)peptide of interest and some region of the drug molecule, the overall charge in that region can be modified to increase that particular interaction.

Identification of binding sites may be assisted by computer programs. Thus, appropriate computer programs can be used for the identification of interactive sites of a putative inhibitor and the (poly)peptide by computer assisted searches for complementary structural motifs (Fassina, Immunomethods 5 (1994), 114-120). Further appropriate computer systems for the computer aided design of protein and peptides are described in the prior art, for example, in Berry, Biochem. Soc. Trans. 22 (1994), 1033-1036; Wodak, Ann. N.Y. Acad. Sci. 501 (1987), 1-13; Pabo, Biochemistry 25 (1986), 5987-5991. Modifications of the drug can be produced, for example, by peptidomimetics and other inhibitors can also be identified by the synthesis of peptidomimetic combinatorial libraries through successive chemical modification and testing the resulting compounds. Methods for the generation and use of peptidomimetic combinatorial libraries are described in the prior art, for example in Ostresh, Methods in Enzymology 267 (1996), 220-234 and Dorner, Bioorg. Med. Chem. 4 (1996), 709-715. Furthermore, the three-dimensional and/or crystallographic structure of activators of the expression of the (poly)peptide of the invention can be used for the design of peptidomimetic activators, e.g., in combination with the (poly)peptide of the invention (Rose, Biochemistry 35 (1996), 12933-12944; Rutenber, Bioorg. Med. Chem. 4 (1996), 1545-1558).

In accordance with the above, in a preferred embodiment of the method of the invention said at least one compound is further refined by peptidomimetics.

The invention furthermore relates to a method of modifying a compound identified or refined by the method as described herein above as a lead compound to achieve (i) modified site of action, spectrum of activity, organ specificity, and/or (ii) improved potency, and/or (iii) decreased toxicity (improved therapeutic index), and/or (iv) decreased side effects, and/or (v) modified onset of therapeutic action, duration of effect, and/or (vi) modified pharmakinetic parameters (resorption, distribution, metabolism and excretion), and/or (vii) modified physico-chemical parameters (solubility, hygroscopicity, color, taste, odor, stability, state), and/or (viii) improved general specificity, organ/tissue specificity, and/or (ix) optimized application form and route by (i) esterification of carboxyl groups, or (ii) esterification of hydroxyl groups with carbon acids, or (iii) esterification of hydroxyl groups to, e.g. phosphates, pyrophosphates or sulfates or hemi succinates, or (iv) formation of pharmaceutically acceptable salts, or (v) formation of pharmaceutically acceptable complexes, or (vi) synthesis of pharmacologically active polymers, or (vii) introduction of hydrophylic moieties, or (viii) introduction/exchange of substituents on aromates or side chains, change of substituent pattern, or (ix) modification by introduction of isosteric or bioisosteric moieties, or

(x) synthesis of homologous compounds, or (xi) introduction of branched side chains, or (xii) conversion of alkyl substituents to cyclic analogues, or (xiii) derivatisation of hydroxyl group to ketales, acetales, or (xiv) N-acetylation to amides, phenylcarbamates, or (xv) synthesis of Mannich bases, imines, or (xvi) transformation of ketones or aldehydes to Schiff's bases, oximes, acetates, ketales, enolesters, oxazolidines, thiozolidinesor combinations thereof; said method optionally further comprising the steps of the above described methods.

The various steps recited above are generally known in the art. They include or rely on quantitative structure-action relationship (QSAR) analyses (Kubinyi, “Hausch-Analysis and Related Approaches”, VCH Verlag, Weinheim, 1992), combinatorial biochemistry, classical chemistry and others (see, for example, Holzgrabe and Bechtold, Deutsche Apotheker Zeitung 140(8), 813-823, 2000).

The invention moreover relates to a method of producing a pharmaceutical composition comprising optionally the steps of the aforementioned methods and further the step of formulating the at least one compound identified, refined or modified by the method of any of the preceding embodiments with a pharmaceutically active carrier or diluent.

The pharmaceutical composition produced in accordance with the present invention may further comprise a pharmaceutically acceptable carrier and/or diluent and/or excipient. Examples of suitable pharmaceutical carriers are well known in the art and include phosphate buffered saline solutions, water, emulsions, such as oil/water emulsions, various types of wetting agents, sterile solutions etc. Compositions comprising such carriers can be formulated by well known conventional methods. These pharmaceutical compositions can be administered to the subject at a suitable dose. Administration of the suitable compositions may be effected by different ways, e.g., by intravenous, intraperitoneal, subcutaneous, intramuscular, topical, intradermal, intranasal or intrabronchial administration. The dosage regimen will be determined by the attending physician and clinical factors. As is well known in the medical arts, dosages for any one patient depends upon many factors, including the patient's size, body surface area, age, the particular compound to be administered, sex, time and route of administration, general health, and other drugs being administered concurrently. A typical dose can be, for example, in the range of 0.001 to 1000 μg (or of nucleic acid for expression or for inhibition of expression in this range); however, doses below or above this exemplary range are envisioned, especially considering the aforementioned factors. Generally, the regimen as a regular administration of the pharmaceutical composition should be in the range of 1 μg to 10 mg units per day. If the regimen is a continuous infusion, it should also be in the range of 1 μg to 10 mg units per kilogram of body weight per minute, respectively. Progress can be monitored by periodic assessment. Dosages will vary but a preferred dosage for intravenous administration of DNA is from approximately 10⁶to 10¹²copies of the DNA molecule. The compositions of the invention may be administered locally or systemically. Administration will generally be parenterally, e.g., intravenously; DNA may also be administered directly to the target site, e.g., by biolistic delivery to an internal or external target site or by catheter to a site in an artery. Preparations for parenteral administration include sterile aqueous or non-aqueous solutions, suspensions, and emulsions. Examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oils such as olive oil, and injectable organic esters such as ethyl oleate. Aqueous carriers include water, alcoholic/aqueous solutions, emulsions or suspensions, including saline and buffered media. Parenteral vehicles include sodium chloride solution, Ringer's dextrose, dextrose and sodium chloride, lactated Ringer's, or fixed oils. Intravenous vehicles include fluid and nutrient replenishers, electrolyte replenishers (such as those based on Ringer's dextrose), and the like. Preservatives and other additives may also be present such as, for example, antimicrobials, anti-oxidants, chelating agents, and inert gases and the like. Furthermore, the pharmaceutical composition of the invention may comprise further agents such as interleukins or interferons depending on the exact intended use of the pharmaceutical composition.

The above methods referring to downstream developments also apply to therapeutically effective compounds referred to in additional embodiments herein below.

In another embodiment of the invention a method is disclosed of selecting a composition for inhibiting leukemia in a patient, the method comprising separately maintaining aliquots of cells of a patient sample in the presence of a plurality of test compositions, comparing the expression profile, preferably the level of expression of a group of markers in each of the aliquots, and selecting one of the test compositions which induces an altered expression profile of the group of markers in the aliquot containing that test composition, relative to other test compositions characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two such as 3, 4, 5, 6, 7, 8, 9 or 10 and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42.

Again, as with the previously recited embodiments, the alteration determined in accordance with the method of the invention in the expression profile or expression level must be in the direction of the expression profile of normal cells or at least diseased but non-leukemic cells. Accordingly, it is also preferred in accordance with this embodiment that the comparison includes an internal standard of expression levels of analysed markers wherein the internal standard represents the expression profile of non-leukemic and preferably normal cells. The comparison may—again—be effected by relying on actual experimental data or on in silico obtained reference data.

The expression “in the direction of the expression profile of normal cells” as used herein preferably relates to cells that comprise blood cells, more preferably a single type of blood cells. Most preferably, the single type of cells corresponds to the type of the leukemic cell. For example, an AML type of leukemic cell would preferably be compared to a healthy myeloic blast cell whereas a ALL type of leukemic cell would preferably be compared to a healthy lymphatic blast cell. Myeloic blast cells and lymphatic blast cells may be isolated from healthy bone marrow using well known methods, such as cell sorting based on flow cytometry using established cell surface markers.

In this method of the invention, it is preferred that the test composition comprises only one putatively active test compound. Insofar, the correlation with the activity of the test compound and the readout is particularly convenient. If the test composition comprises more than one putatively pharmaceutically active compounds, it may be considered to separately test each compound in a composition that has tested positive in a first round of the assay. Consequently, in a second round, i.e. in a repetition of steps (a) and (b), the various compositions tested positive, if any, in the first round, may be subdivided into single compounds and these single compounds tested again for their efficacy. The goal of such an approach, of course, is to obtain a composition comprising a single active compound only.

In another embodiment a method of determining new subtypes of leukemia cells is disclosed, the method comprising determining. the expression profile, preferably the level of expression of a group of markers of leukemia cells of unknown subtype, comparing the expression profile to the level of expression, ie. the expression profile, of a group of markers of leukemia cells of known subtype, thereby concluding that a new subtype is determined when the expression profile, preferably the level of expression is different to all known subtypes characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42.

The term “subtype of leukemia cells” in accordance with the present invention may be better understood in accordance with the following Leukemias are subdivided according to their natural clinical course into acute and chronic leukemias. Based on the cell line they are derived from they are further subdivided into myeloid and lymphatic leukemias. This results in four leukemia types, i.e. acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), chronic myeloid leukemia (CML), and chronic lymphatic leukemia (CLL). Based on genetic, phenotypic, and biological characteristic, which are assessed by cytomorphology, cytochemistry, cytogenetics, immunophenotyping, and molecular genetics, AML, ALL, and CLL are further subdivided into subtypes. These subtypes are associated with highly differing prognoses. Treatment approaches specific for these subtypes are applied and are being further optimized. Thus, an exact diagnosis based on a reliable and reproducible method is essential for the selection of the appropriate subtype-specific treatment.

The new subtypes identified in accordance with the invention may then be subjected in the same or in further patients to the other methods/embodiments of the invention.

In another embodiment a method is disclosed for guiding the therapy of leukemia in a patient depending on the leukemia subtype and/or the risk of relapse of disease, the method comprising determining the expression profile, preferably the level of expression of a group of markers in the patient sample, and deciding about the therapy strategy depending on the leukemia subtype or the risk of relapse of disease characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two such as 3, 4, 5, 6, 7, 8, 9 or 10 and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42.

This embodiment is particularly important for the quick and reliable recovery of the patient from the leukemia that effects him or her. As has been stated above, the early and reliable diagnosis of the leukaemia type or subtype is particularly important for the instigation of a useful and straightforward treatment regimen. An incorrect diagnosis may result in the application of a wrong treatment regimen which, in turn, may lead to significant health risks including premature death of the patient. In accordance with the present invention, a reliable means has been provided that, based on the inventive selection of markers provided, will overcome the prior art problems of an insecure or an inappropriate time frame demanding diagnosis. In particular, the present method of the invention provides in step (a) an unambiguous and safe basis for the decision step (b). Again, the patient may safely rely on the conclusion drawn in step (b) due to the strong inherent correlation that has been achieved between the selection of markers and the leukemia subtype. The relation of tables to leukemia subtypes has also been demonstrated elsewhere in this specification.

In another embodiment of the invention, a method for monitoring the progression of leukemia in a patient is disclosed, the method comprising determining the expression profile, preferably the level of expression of a group of markers in a patient sample at a first point in time, and repeating this step at a subsequent point in time; and comparing the expression profile, preferably the level of expression detected in the previous steps and therefrom monitoring the progression of leukemia in the patient, characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two such as 3, 4, 5, 6, 7, 8, 9 or 10 and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. In a preferred embodiment, the patient has undergone chemotherapy between the first point in time and the subsequent point in time (including repetitions of step (b).

In this embodiment of the present invention, the skilled artisan may repeat step (b) one or more times in order to collect additional data from different (more) time points. The additional data obtained by such further measurements may provide an overall better overview on the progress of the disease.

In accordance with this embodiment of the disease, the term “progression of leukemia” includes the interpretation of “regression of leukemia”, i.e. includes the interpretation of a negative progression. This is of course in line with the aim of the therapy and the desire of the patient.

In the methods according to the invention it is preferred that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two and the total number of markers listed in the at least one of tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. In a preferred embodiment, the number of markers in the group is between five, more preferably between 7, 10 or 15 and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. It is feasible that the group of markers not only consists of those markers but also comprises them as the data will then be still statistically significant, i.e. the preferred groups may additionally contain 10, 50 or 100 other markers and comprise the other markers according to the invention and mentioned above. It is, however, also feasible for the expert skilled in the art that only a single suitable marker is determined with the methods according to the invention.

Particularly preferred markers used in a method where only one or a few as e.g. one, preferably two markers are used are described in Table 22 and Example 3, FIG. 12 or the markers marked with an asterisk in table 20 and shown in tables 16 to 19 as the preferred set of markers. In detail, example 3 mentions (see example 3 for more details) the following markers including their expression level:

- ADCY3
- adenosine deaminase (ADA)
- ARGHGAP4
- B-cell specific coactivator of octamer binding transcription factors
- CAPN3 is a member of the papain superfamily and was higher expressed in CML
- CBFβ-MYH11
- CD24
- CD27, was identified to assign samples either ALL or CLL
- CD74 plays a critical role in MHC class II antigen processing
- connective tissue growth factor (CTGF)
- CTGF
- CTSW
- MYH11
- glucocorticoid receptor beta
- higher expression of CBFA2T1 (formerly ETO)
- HLA-DMB
- HOXA9
- HOXB5
- IRF4, an immune system-restricted interferon regulatory factor
- KIAA1013
- LCN2 that shown to be a modulator of inflammation
- LEF-1 was absent in myeloid leukemias but highly expressed in lymphoid leukemias
- MBNL
- MSF translocation partner of the mixed-lineage leukemia gene (MLL) in AML
- NCOA 1 expressed higher in CLL as compared to ALL
- OS-9differentially expressed between AML and ALL (14)
- Phospholipidscramblase 1 (PLSCR1) to be lower expressed in AML and ALL as compared to normal BM
- POU2AF1
- POU2F2
- POU4F1
- SCYA3
- SGP28
- SOCS-2
- TRB and CD3D

Particularly preferred markers used in a method where only one or a few as e.g. one, preferably two markers are used are described in tables 30, 33, 36 and 42 and Example 7, FIGS. 189 to 234, 254 to 272, 338 to 371, 433 to 465, respectively, or the markers marked with an asterisk in tables 29, 32, 35, 38, and 41 and FIGS. 24 to 188, 235 to 253, 273 to 337, 372 to 405, 406 to 432, respectively as the preferred set of markers. In detail, example 7 mentions (see example 7 for more details) the following markers including their expression level:

geneIDgene symbolfeature201162_atIGFBP7CLL low201163_s_atIGFBP7CLL low201362_atNS1-BPCML high201496_x_atMYH11AML inv(16) high201497_x_atMYH11AML inv(16) high201998_atSIAT1CLL high202095_s_atBIRC5CLL low203074_atANXA8AML t(15; 17) high204150_atSTAB1AML t(15; 17) high204511_atKIAA0793CLL high205528_s_atCBFA2T1AML t(8; 21) high205529_s_atCBFA2T1AML t(8; 21) high205805_s_atROR1CLL high206940_s_atPOU4F1AML t(8; 21) high207819_s_atABCB4CLL high208091_s_atDKFZP564K0822CLL high208456_s_atRRAS2CLL high209061_atNCOA3CLL high209101_atCTGFALL t(4; 11) high, ALL Phhigh, T-ALL high209374_s_atIGHMCLL high209616_s_atCES1AML MLL high210997_atHGFAML t(15; 17) high212285_s_atAGRNAML t(15; 17) high213539_atCD3DT-ALL high214450_atCTSWAML t(15; 17) high215925_s_atALL t(4; 11) high218223_s_atLOC51177CML low222166_atAML +8 high224520_s_atMGC13168ALL t(8; 14) high224794_s_atLOC51148AML t(15; 17) high225660_atSEMA6AALL B not Ph high, ALLPh high226496_atHomo sapiens, Similar toALL high, CLL highhypothetical proteinFLJ22611, cloneMGC: 24716IMAGE: 4277726,mRNA, complete cds228827_atHomo sapiens cloneAML t(8; 21) high25023 mRNA sequence228904_atESTsAML normal high, AML +8high, AML complex high236301_atHomo sapiens, cloneCLL highIMAGE: 3866403,mRNA236892_s_atHOXB6AML normal high, AML +8high, AML complex high239214_atESTsALL t(4; 11) high239393_atESTsALL t(4; 11) high239791_atHOXB6AML normal high, AML +8high240581_atESTsALL t(4; 11) high241464_s_atESTsAML MLL high, AMLnormal high, AML +8 high,AML complex high241525_atESTsAML inv(16) high$$2_s_atLEF1ALL high, CLL high$$_atCTNST-ALL low$$_atFLJ12442AML t(15; 17) high$$05_atLGALS1ALL t(4; 11) high204044_atQPRTALL t(4; 11) high205899_atCCNA1ALL t(4; 11) high209168_atGPM6BALL t(4; 11) high213539_atCD3DT-ALL high213894_atKIAA0960ALL t(4; 11) high215925_s_atALL t(4; 11) high218224_atPNMA1T-ALL high219463_atC20orf103ALL t(4; 11) high219631_atFLJ12929T-ALL high225563_atESTsALL t(4; 11) high225592_atNRMALL t(4; 11) high228083_atHomo sapiens mRNA;ALL t(4; 11) highcDNA DKFZp434I1216(from cloneDKFZp434I1216)228988_atZNF6T-ALL high235749_atALL t(8; 14) high242414_atESTsALL t(4; 11) high243756_atESTsALL t(4; 11) high201497_x_atMYH11AML inv(16) high228827_atHomo sapiens cloneAML t(8; 21) high25023 mRNA sequence38487_atFLJ12442AML t(15; 17) high203074_atANXA8AML t(15; 17) high205528_s_atCBFA2T1AML t(8; 21) high205529_s_atCBFA2T1AML t(8; 21) high206940_s_atPOU4F1AML t(8; 21) high211341_atPOU4F1AML t(8; 21) high201496_x_atMYH11AML inv(16) high228660_x_atSEMA4Fother high202718_atIGFBP2AML t(15; 17) high205380_atPDZK1other high202746_atAML MLL low201596_x_atKRT18AML t(8; 21) low34210_atCDW52AML t(15; 17) low212850_s_atLRP4AML inv(16) high228904_atESTsAML t(8; 21) low, AMLt(15; 17) low, AML inv(16)low, AML MLL low203151_atMAP1AAML t(8; 21) low201137_s_atHLA-DPB1AML t(15; 17) low200675_atCD81AML inv(16) low201425_atALDH2AML t(8; 21) low202085_atTJP2AML inv(16) low202619_s_atPLOD2AML MLL low203092_atTIMM44AML inv(16) low204425_atARHGAP4AML t(15; 17) low205366_s_atHOXB6AML t(8; 21) low, AMLt(15; 17) low, AML inv(16)low, AML MLL low205472_s_atDACHAML MLL high206761_atTACTILEAML MLL low222166_atAML +8 low222335_atESTsAML MLL low223318_s_atMGC10974AML complex low225330_atHomo sapiens, cloneAML inv(16) lowMGC: 18216IMAGE: 4156235,mRNA, complete cds231277_x_atESTsAML complex low635_s_atPPP2R5Bother low202503_s_atKIAA0101CLL low202580_x_atFOXM1CLL low202709_atFMODCLL high204882_atKIAA0053CLL high205049_s_atCD79AALL high, CLL high205051_s_atKITAML high205382_s_atDFAML high205599_atTRAF1CML low CLL high206255_atBLKALL high, CLL high206398_s_atCD19ALL high, CLL high210487_atDNTTALL high210948_s_atLEF1ALL high, CLL high211352_s_atNCOA3CLL high211404_s_atAPLP2AML high214761_atOAZALL high217950_atNOSIPCLL high218090_s_atCLL high218516_s_atFLJ20421normal BM low218916_atFLJ23436normal BM low219753_atSTAG3ALL high221969_atPAX5ALL high, CLL high223703_atCDA017AML high, CML high,normal BM high226147_s_atHomo sapiensCLL highcDNA: FLJ22667 fis,clone HSI08385228471_atESTsCLL high229487_atESTsALL high229790_atTERF2CML low, BM low231736_x_atMGST1AML high, CML high,normal BM high231854_atHomo sapiens cDNACML lowFLJ11448 fis,clone HEMBA1001391239287_atESTsCLL high243362_s_atLEF1ALL high243363_atLEF1ALL high, CLL high41577_atPPP1R16BCML low

Preferred methods for detection and quantification of the amount of nucleic acids, i.e. for the methods according to the invention allowing the determination of the level of expression of a marker or a group of markers, are those described by Sambrook et al. (1989) or real time methods known in the art as the TaqMan® method disclosed in WO92/02638 and the corresponding U.S. Pat. No. 5,210,015, U.S. Pat. No. 5,804,375, U.S. Pat. No. 5,487,972. This method exploits the exonuclease activity of a polymerase, to generate a signal. In detail, the (at least one) target nucleic acid component is detected by a process comprising contacting the sample with an oligonucleotide containing a sequence complementary to a region of the target nucleic acid component and a labeled oligonucleotide containing a sequence complementary to a second region of the same target nucleic acid component sequence strand, but not including the nucleic acid sequence defined by the first oligonucleotide, to create a mixture of duplexes during hybridization conditions, wherein the duplexes comprise the target nucleic acid annealed to the first oligonucleotide and to the labeled oligonucleotide such that the 3′-end of the first oligonucleotide is adjacent to the 5′-end of the labeled oligonucleotide. Then this mixture is treated with a template-dependent nucleic acid polymerase having a 5′ to 3′ nuclease activity under conditions sufficient to permit the 5′ to 3′ nuclease activity of the polymerase to cleave the annealed, labeled oligonucleotide and release labeled fragments. The signal generated by the hydrolysis of the labeled oligonucleotide is detected and/or measured. TaqMan® technology eliminates the need for a solid phase bound reaction complex to be formed and made detectable. Other methods include e.g. fluorescence resonance energy transfer between two adjacently hybridized probes as used in the LightCycler® format described in U.S. Pat. No. 6,174,670.

Protocols for carrying out the methods according to the invention are known to the expert in the field and are described in the examples, preferably in example 1 and 4. A preferred protocol is described in Example 1 (A), where total RNA is isolated, cDNA synthesized and biotin incorporated during the transcription reaction. The purified cDNA was applied to commercially available arrays which can be obtained e.g. from Affymetrix. The hybridized cDNA is detected according to the methods described in Example 1 (A). The arrays are produced by photolithography or other methods known to experts skilled in the art e.g. from U.S. Pat. No. 5,445,934, U.S. Pat. No. 5,744,305, U.S. Pat. No. 5,700,637, U.S. Pat. No. 5,945,334 and EP619 321 or EP 373 203. The latter methods are also suitable for producing the composition according to the inventions in particular the composition wherein polynucleotides or oligonucleotides are bound to a solid phase in particular in the form of arrays. In a further preferred embodiment of the methods according to the invention, a transcribed polynucleotide or portion thereof is the marker or at least one of the markers. A particularly preferred transcribed polynucleotide is an mRNA or a cDNA. In a preferred embodiment of the methods according to the invention, the step of determining the expression profile further comprises amplifying the transcribed polynucleotide. In another preferred embodiment, the level of expression, i.e. the expression profile, of the group of transcribed polynucleotides is determined by annealing the transcribed polynucleotides with a complementary polynucleotide or a portion thereof under stringent hybridization conditions. The term “stringent hyberidisation conditions” is equivalent to the term “highly stringent hyberdisation conditions”. Such highly stringent hybridization conditions may be determined in accordance with the teachings provided in Hames and Higgins (eds) “Nucleic acid hybridization, a practical approach”, IRL Press 1985, Oxford, and include hybridization at 55-65° C. in 0.2-0.5×SSC, 0.1% SDS followed by appropriate washing conditions such as 0.5-1×SSC at 55° C. and 0.1% SDS.

In a most preferred embodiment, the patient sample is blood, i.e. blood mononuclear cells, or bone marrow, i.e. mononuclear cells. The methods according to the invention may be performed on fresh or frozen blood, i.e. blood mononuclear cells or bone marrow, i.e. mononuclear cells.

In a preferred embodiment the marker or at least one of the markers is a protein. In another preferred embodiment the expression profile of the proteins is detected using a reagent which specifically binds to one of the proteins whereby preferably the reagent is selected from the group consisting of an antibody, an antibody derivative, and an antibody fragment.

The term “antibody” comprises monoclonal antibodies as first described by Köhler and Milstein in Nature 278 (1975), 495-497 as well as polyclonal antibodies, i.e. entibodies contained in a polyclonal antiserum. Monoclonal antibodies include those produced by transgenic mice. Fragments of antibodies include F(ab′)₂, Fab and Fv fragments. Derivatives of antibodies include scFvs, chimeric and humanized antibodies. See, for example Harlow and Lane, loc. cit.

Another embodiment of the invention is a kit preferably for assessing the suitability of each of a plurality of compounds for inhibiting leukemia in a patient, the kit optionally comprising the plurality of compounds; and a reagent for assessing the expression profile of a group of markers characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between two and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. Another embodiment is a kit preferably for assessing whether a patient is afflicted with leukemia, the kit comprising reagents for assessing the expression profile of a group of markers characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between two and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. Another embodiment is a kit preferably for assessing the presence of human leukemia cells, the kit comprising an antibody, wherein the antibody specifically binds with a protein corresponding to a marker characterized in that the marker is selected from the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. Another embodiment is a kit preferably for assessing the leukemia cell carcinogenic potential of a test compound, the kit comprising leukemia cells and a reagent for assessing expression of a marker, wherein the marker is selected from the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42.

Advantageously, the kit of the present invention further comprises, optionally (a) storage solution(s) and/or remaining reagents or materials required for the conduct of scientific and/or diagnostic and/or therapeutic methods. Furthermore, parts of the kit of the invention can be packaged individually in vials or bottles or in combination in containers or multicontainer units.

Another embodiment of the invention is related to a protein or the RNA, cDNA or cRNA corresponding to a marker selected from the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 or the use thereof for the treatment of or vaccination against leukemia. Alternatively and depending on the exact purpose, inhibitors of these compounds such as antibodies, fragments or derivatives thereof may be employed for said purpose.

The invention also contemplates a method for the development or preparation of a pharmaceutical composition for the treatment of leukemia characterized in that a protein corresponding to a marker selected from the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 is admixed with pharmaceutical compounds. Another embodiment of the invention is related to a method for the development or preparation of a pharmaceutical composition for the treatment of leukemia characterized in that a vector comprising a polynucleotide encoding a protein corresponding to a marker selected from the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 is admixed with pharmaceutical compounds. Another embodiment of the invention is a method for the development or preparation of a pharmaceutical composition for the treatment of leukemia characterized in that an antisense oligonucleotide complementary to a polynucleotide encoding a protein corresponding to a marker selected from the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 is admixed with pharmaceutical compounds. Alternatively, inhibitors such as antibodies specific for the markers may be used for the preparation or development of a pharmaceutical composition.

The term “pharmaceutical compounds” is preferably to be understood to mean pharmaceutically acceptable carriers, diluents or excipients, only in connection with the embodiments recited in this paragraph. In another embodiment of the invention a marker or a group of markers selected individually from one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 is used for the determination of leukemia cells, the type or subtype of leukemia cells.

In another embodiment of the invention a marker or a group of markers selected individually from one or more of the tables 1, 2, 13, 14, 17, 25, 27, 35 or 36 is used for the determination of the subtype of AML cells.

In a preferred embodiment, the invention is related to a composition comprising a group of markers and substances chemically different to the markers characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 and whereby the number of markers in the group is between one, preferably two and the total number of markers listed in the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. It is preferred that the composition according to the invention is characterized in that the group of markers consists of all markers listed in one or more of the tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42. More preferred the composition according to the invention is characterized in that the group of markers consists of all markers listed in one or more of the tables 14, tables 16 to 20, or table 29 or 30, most preferred the group of markers consists of all markers listed in the tables 16 to 20 or tables 29 or 30. Preferably the markers are polynucletides or oligonucleotides, whereby the polynucleotides are bound to a solid phase in the form of an array.

The present invention also relates to a method of determining the subtypes of ALL cells in a patient sample comprising the steps of a) determining the level of expression of a group of markers in the patient sample and b) concluding from the differences in the level of expression which subtypes of ALL cells the patient sample contains characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 18, 32 or 33 and whereby the number of markers in the group is between two and the total number of markers listed in the tables 18, 32 or 33.

Preferably the group of markers consists of all markers listed in one or more of the tables 18, 32 or 33.

The present invention further relates to a method of determining the subtypes of CLL cells in a patient sample comprising the steps of a) determining the level of expression of a group of markers in the patient sample and b) concluding from the differences in the level of expression which subtypes of CLL cells the patient sample contains characterized in that the group of markers consists of markers selected independently from the markers listed in one or more of the tables 38 or 39 and whereby the number of markers in the group is between two and the total number of markers listed in the tables 38 or 39.

It is preferred that the group of markers consists of all markers listed in one or more of the tables 38 or 39.

The present invention is also related to a diagnostic composition comprising at least one nucleic acid molecule, preferably (a) single-stranded nucleic acid molecule(s), which is capable of specifically hybridizing to the mRNA of at least one gene listed in Table 1. The use of said nucleic acid molecules for diagnosis of leukemia subtypes, preferably based on microarray technology, offers the following advantages: (1) more rapid and more precise diagnosis, (2) easy to use in laboratories without specialized experience, (3) abolishes the requirement for analyzing viable cells for chromosome analysis (transport problem), (4) very experienced hematologists for cytomorphology and cytochemistry, immunophenotyping as well as cytogeneticists and molecularbiologists are no longer required, and (5) improves the subclassification of leukemia due to the definition of new entities based on gene expression profiles in those subtypes that are not clearly defined with the methods of the prior art (class discovery).

As used herein, the term “capable of specifically hybridizing” has the meaning of hybridization under conventional hybridization conditions, preferably under stringent conditions as described, for example, in Sambrook, J., et al., in “Molecular Cloning: A Laboratory Manual” (1989), Eds. J. Sambrook, E. F. Fritsch and T. Maniatis, Cold Spring Harbour Laboratory Press, Cold Spring Harbour, NY and the further definitions provided above. Also contemplated are nucleic acid molecules that hybridize at lower stringency hybridization conditions. Changes in the stringency of hybridization and signal detection are primarily accomplished through the manipulation, preferably of formamide concentration (lower percentages of formamide result in lowered stringency), salt conditions, or temperature. For example, lower stringency conditions include an overnight incubation at 37° C. in a solution comprising 6×SSPE (20×SSPE=3M NaCl; 0.2M NaH2PO4; 0.02M EDTA, pH 7.4), 0.5% SDS, 30% formamide, 100 mg/ml salmon sperm blocking DNA, followed by washes at 50° C. with 1×SSPE, 0.1% SDS. In addition, to achieve even lower stringency, washes performed following stringent hybridization can be done at higher salt concentrations (e.g. 5×SSC). Variations in the above conditions may be accomplished through the inclusion and/or substitution of alternate blocking reagents used to suppress background in hybridization experiments. The inclusion of specific blocking reagents may require modification of the hybridization conditions described above, due to problems with compatibility.

As a hybridization probe (or primer) nucleic acid molecules can be used, for example, that have exactly or basically the nucleotide sequence of at least one of the genes depicted in the appended tables or parts of these sequences. The term nucleic acid molecule as used herein also comprises fragments which are understood to be parts of the nucleic acid molecules that are long enough to specifically hybridize to transcripts of at least one of the genes of the appended tables. These nucleic acid molecules can be used, for example, as probes or primers in a diagnostic assay. Preferably, the nucleic acid molecules of the present invention have a length of at least 8, 10, 12, 13, 15, 18 in particular of at least 20 and particular preferred of at least 25 nucleotides. The nucleic acid molecules of the invention or parts therefrom* can also be used, for example, as primers for a PCR reaction. The fragments used as hybridization probe can be synthetic fragments that were produced by means of conventional synthesis methods.

In a preferred embodiment, the diagnostic composition of the present invention comprises at least nucleic acid molecules which are capable of specifically hybridizing to the mRNAs of at least one of the genes listed in the appended tables, preferably 2-5, more preferably 8-12 genes.

In a more preferred embodiment, the diagnostic composition of the present invention comprises at least nucleic acid molecules which are capable of specifically hybridizing to the mRNAs of at least one of the genes listed in the appended tables. In a further preferred embodiment, the diagnostic composition of the present invention comprises at least nucleic acid molecules which are capable of specifically hybridizing to the mRNAs of all genes listed in the appended tables.

In a further preferred embodiment, the nucleic acid molecules of the diagnostic composition of the present invention are bound to (a) a solid support, for example, a polystyrene microtiter dish or nitrocellulose membrane or glass surface or (b) to non-immobilized particles in solution.

In an even more preferred embodiment, the nucleic acid molecules of the diagnostic composition are present in a microarray format which can be established according to well known methods; for details see, e.g., www.affymetrix.com/technology/tech_spotted.html; www.affymetrix.com/technology/tech_probe.html.

The present invention also provides the use of (a) nucleic acid molecule(s) of the present invention for the preparation of a diagnostic composition for the diagnosis of a leukemia or for the diagnosis of several subtypes or a disposition to a leukemia. For the diagnosis of a particular leukemia subtype, preferably, at least 5 different nucleic acid molecules are used as probes. For diagnosis, preferably, bone marrow or peripheral blood can be used. For diagnosis, the target sample is contacted with a (a) nucleic acid molecule(s) of the present invention and the concentration of individual mRNAs is compared with the mRNA expression profile levels of a test sample obtained from healthy donors.

It is a further embodiment of the invention to provide a method of determining whether a patient sample contains leukemia cells or other cells and at the same time determining the type and subtype of leukemia cells, comprising the steps of providing a patient sample, isolating RNA from the patient sample, transcribing the RNA into cDNA and transcribing the cDNA into cRNA while simultaneously labelling the cRNA, hybridising the cRNA to a microarray, and determining the level of expression of a marker or a group of markers.

Further, the invention contemplates the use of a marker or a group of markers for determining whether a patient sample contains leukemia cells or other cells and whereby preferably the type and subtype of leukemia cells is simultaneously or subsequently is determined. The markers specified in the appended examples and tables may, in accordance with the invention, be used to differentiate, for example, between ALL, CLL, CML and AML.

The nucleic acid molecule is typically a nucleic acid probe for hybridization or a primer for PCR. The person skilled in the art is in a position to design suitable nucleic acids probes based on the information provided in the appended tables.

The target cellular component, i.e. mRNA e.g., in bone marrow or blood (BM) may be detected directly in situ, e.g. by in situ hybridization or it may be isolated from other cell components by common methods known to those skilled in the art before contacting with a probe. Detection methods include Northern blot analysis, RNase protection, in situ methods, e.g. in situ hybridization, in vitro amplification methods (PCR, LCR, QRNA replicase or RNA-transcription/amplification (TAS, 3SR), reverse dot blot disclosed in EP 0 237 362)) and other detection assays that are known to those skilled in the art. Preferably, detection is based on a microarray.

Amplification methods include the polymerase chain reaction (PCR) which specifically amplifies target sequences to detectable amounts. Other possible amplification reactions are the ligase Chain Reaction (LCR, Wu and Wallace, 1989, Genomics 4:560-569 and Barany, 1991, Proc. Natl. Acad. Sci. USA 88:189-193); Polymerase Ligase Chain Reaction (Barany, 1991, PCR Methods and Applic. 1:5-16); Gap-LCR(PCT Patent Publication No. WO 90/01069); Repair Chain Reaction (European Patent Publication No. 439, 182 A2), 3SR (Kwoh et al., 1989, Proc. Natl. Acad. Sci. USA 86:1173-1177; Guatelli et al., 1990, Proc. Natl. Acad. Sci. USA 87:1874-1878; PCT Patent Publication No. WO 92/0880A), and NASBA (U.S. Pat. No. 5,130,238). Further, there are strand displacement amplification (SDA), transcription mediated amplification (TMA), and Q□-amplification (for a review see e.g. Whelen and Persing (1996). Annu. Rev. Microbiol. 50, 349-373; Abramson and Myers, 1993, Current Opinion in Biotechnology 4:41-47).

Products obtained by in vitro amplification can be detected according to established methods, e.g. by separating the products on agarose gels and by subsequent staining with ethidium bromide. Alternatively, the amplified products can be detected by using labeled primers for amplification or labeled dNTPs.

The probes can be delectably labeled, for example, with a radioisotope, a bioluminescent compound, a chemiluminescent compound, a fluorescent compound, a metal chelate, biotin or an enzyme.

The invention further contemplates a method of making an isolated hybridoma which produces an antibody useful for assessing whether a patient is afflicted with leukemia, the method comprising isolating a protein corresponding to a marker selected from the group consisting of the markers listed in Tables 1 to 20, tables 25 or 27 or tables 29, 30, 32, 33, 35, 36, 38, 39, 41, 42 immunizing a mammal using the isolated protein, or a peptide corresponding to its sequence or a part thereof; isolating splenocytes from the immunized mammal-, fusing the isolated splenocytes with an immortalized cell line to form hybridomas; and screening individual hybridomas for production of an antibody which specifically binds with the protein to isolate the hybridoma. Further, an antibody produced by this method is contemplated by the invention. The antibody may be fragmented or derivated to obtained fragment or derivatives retaining the antibody specificity as has been described herein above.

The invention further contemplates a method of assessing the leukemia cell carcinogenic potential of a test compound, the method comprising maintaining separate aliquots of leukemia cells in the presence and absence of the test compound; and comparing expression of a marker in each of the aliquots, wherein a significantly altered level of expression of the marker in the aliquot maintained in the presence of the test compound, relative to the aliquot maintained in the absence of the test compound, is an indication that the test compound possesses human breast cell carcinogenic potential wherein a marker according to the invention is used.

The invention further contemplates a system for identifying selected polynucleotide records that identify a leukemia cell, the system comprising: a digital computer-, a database coupled to the computer; a database coupled to the database server having data stored in, the data comprising records of data comprising a polynucleotide, corresponding to a marker according to the invention and a code mechanism for applying queries based upon a desired selection criteria to the data file in the database to produce reports of polynucleotide records which match the desired selection criteria.

The invention also relates to a method for detecting a leukemia cell, using a computer having a processor, memory, display, and input/output devices, the method comprising the steps of

a) providing a sequence of a polynucleotide isolated from a sample suspected of containing a leukemia cell,

b) providing a database comprising records of data comprising a polynucleotide corresponding to a group of markers according to the invention;

c) using a code mechanism for applying queries based upon a desired selection criteria to the data file in the database to produce reports of polynucleotide records of step a) which provide a match of the desired selection criteria of the sequences in the database of step b), the presence of a match being a positive indication that the polynucleotide of step 1) has been isolated from a cell that is a-leukemia cell.

Also, the present invention relates to a method for assessing the leukemia cell carcinogenic potential of a test compound, comprising (a) contacting a non-leukemia cell with a test compound, and (b) assessing an increase or decrease of marker expression in said non-leukemia cell wherein the marker is selected from the tables 1 to 20, 25 or 27, 29, 30, 32, 33, 35, 36, 38, 39, 41 or 42.

The assessment may be effected on the nucleic acid level such as by hybridization techniques or PCR or on the protein level such as by using antibody or aptamers based technologies.

Finally, the invention relates to a diagnostic composition comprising at least one nucleic acid molecule which is capable of specifically hybridizing to the mRNA corresponding to the marker gene of any of the appended tables. The nucleic acid molecule may be an antisense DNA or RNA an RNAi molecule a siRNA molecule or the like inhibitory molecule capable of specifically blocking transcription and/or translation and/or modification and/or localization of the RNA and/or protein corresponding to the marker gene.

The nucleic acid may also be a sense-strand nucleic acid e.g. RNA or preferably DNA which is capable of expressing the protein product of the marker gene, or a protein product of substantially similar activity, in a target cell into which it is introduced.

The invention further comprises pharmaceutical compositions comprising a compound capable of specifically binding to a protein or RNA corresponding to a marker of the invention as listed in any of the appended tables. The marker is preferably selected from the markers designated as particular preferred markers as described herein above. The compound is preferably a compound capable of inhibiting or increasing the function of the protein or of enhancing or decreasing translation of the RNA. The compound is preferably selected from aptamers, aptazynes, RNAzynes, antibodies, affybodies, trinextins, anticalins, or the like compounds. The effect of the compounds on the RNA may be tested by assaying for increased/decreased synthesis of the corresponding protein. The effect of the compounds on the protein may be assayed the testing the effect of the compound in an assay of the proteins function, which e.g. may be an anzymathic function. Alternatively, the effect may be tested by contacting a leukemic cell that expresses large amounts of such protein with the compound and assay cellular parameters associated with the leukemic state of the cell, such as cell growth, growth factor dependency and/or differentiation state of the cell.