This invention relates to methods, systems and equipment for diagnosing AML and MDS.
Myelodysplastic syndromes (MDS) are a heterogeneous group of clonal disorders of bone marrow cell precursors characterized by variable clinical courses and outcomes. Approximately 30 percent of patients with MDS eventually progress to acute myelogenous leukemia (AML) and a clinical diagnostic assay especially suited to early identification of this subset of patients would help focus therapeutic options in these individuals.
A number of indices have been identified as important prognostic factors in MDS, including cytogenetic assessment, quantitation of blast percentages, and morphologic assessment of cell lines. Different risk classification systems have been developed to predict the overall survival of MDS patients and the progression from MDS to AML. Examples of these classification systems include the French-American-British (FAB) classification, the International Prognostic Scoring System (IPSS), the Bournemouth score, the Sanz score, and the Lille score. The French-American-British (FAB) classification system categorizes patients into one of five categories on the basis of observed cell morphologies and percentage of myeloblasts in the bone marrow and associates a median expected survival time with each category. The International Prognostic Scoring System (IPSS) incorporates assessment of cytogenetics, the number of cell lines involved, and the percentage of blasts in the bone marrow in patients and assigns a risk and median survival time to an overall IPSS score.
Recent expression profiling studies have revealed differences in AC133 surface-marker positive hematopoeitic stem cell fractions from patients with MDS versus AML (Miyazato et al., B
The present invention identifies numerous AML or MDS disease genes which are differentially expressed in bone marrow mononuclear cells (BMMCs) of AML or MDS patients as compared to BMMCs of disease-free humans. These disease genes can be used as molecular markers for diagnosing or monitoring the progression or treatment of AML or MDS. These genes can also be used for the early identification of MDS patients who eventually progress to AML.
In one aspect, the present invention provides methods useful for diagnosing or monitoring the progression or treatment of AML or MDS. The methods include comparing an expression profile of at least one gene in a bone marrow sample of a patient of interest to a reference expression profile, where the gene is differentially expressed in BMMCs of patients who have AML or MDS as compared to BMMCs of disease-free humans. In many embodiments, the gene is an AML or MDS disease gene selected from Tables 1 and 3.
Any number of AML or MDS disease genes can be employed. In one embodiment, the AML or MDS disease gene(s) is selected from those that have p values of no more than 0.005, 0.001, 0.0005, 0.0001, or less. In another embodiment, the AML or MDS disease gene(s) is selected from those that are significantly correlated with the class distinction between AML or MDS patients and disease-free humans. For instance, the AML or MDS disease gene(s) can be selected from those above the 1%, 5%, or 10% significance level in a permutation test.
In yet another embodiment, the AML or MDS disease genes are selected to include at least one gene upregulated in BMMCs of disease-free humans, at least one gene upregulated in BMMCs of AML patients, and at least one gene upregulated in BMMCs of MDS patients. In one example, the AML or MDS disease genes include the 91 genes depicted in Table 7a.
In many embodiments, the reference expression profile is an average expression profile of one or more AML or MDS genes in bone marrow samples of disease-free humans or patients of a known disease class. The reference expression profile and the expression profile of the patient of interest can be prepared using the same or comparable method. The expression profiles can also be prepared using different methods. Suitable methods for preparing a gene expression profile include, but are not limited to, quantitative RT-PCR, Northern Blot, in situ hybridization, slot-blotting, nuclease protection assay, nucleic acid arrays, immunoassays (such as ELISA, RIA, FACS, or Western Blot), two-dimensional gel electrophoresis, mass spectroscopy, and protein arrays.
In many embodiments, the bone marrow samples used in the present invention are whole bone marrow samples or samples containing enriched BMMCs or bone marrow leukocytes. The patient of interest may have AML, MDS which eventually progresses to AML, or MDS which does not progress to AML. The patient of interest may also be free from AML or MDS.
In one embodiment, the expression profile of the patient of interest is compared to at least two reference expression profiles. Each of the reference expression profiles is an average expression profile of one or more AML or MDS genes in bone marrow samples of disease-free humans or patients of a known disease class.
In another embodiment, the expression profile of the patient of interest is compared to at least three reference expression profiles. The first reference expression profile is an average expression profile of one or more AML or MDS genes in bone marrow samples of disease-free humans. The second reference expression profile is an average expression profile of the AML or MDS gene(s) in bone marrow samples of patients having AML. The third reference expression profile is an average expression profile of the AML or MDS gene(s) in bone marrow samples of patients having MDS.
Comparison of expression profiles can be performed manually or electronically. In one embodiment, the expression profile of the patient of interest is compared to two or more reference expression profiles by using a weighted voting algorithm.
The present invention also features methods for detecting early progression from MDS to AML. In one embodiment, the methods include assigning a class membership to an MDS patient. Where the bone marrow expression profile of the MDS patient is substantially similar to that of AML patients (e.g., resulting in an AML class membership), or the prediction confidence score is relatively low (e.g., below 0.1, 0.05, 0.01, or less), a positive prediction can be made that the MDS patient is likely to develop AML.
In another aspect, the present invention provides other methods that are useful for diagnosing or monitoring the progression or treatment of AML or MDS. The methods include comparing an expression profile of one or more genes in a bone marrow sample of a patient of interest to a reference expression profile, where the gene(s) is selected from Tables 8b and 9b.
In still another aspect, the methods of the present invention include comparing an expression profile of one or more genes in a bone marrow sample of a patient of interest to a reference expression profile, wherein the gene(s) is selected from Table 10b.
In addition to the genes listed in Tables 1, 3, 8b, 9b, and 10b, the present invention contemplate detection of the expression profiles of other genes that can hybridize under stringent or nucleic acid array hybridization conditions to the qualifiers selected from Tables 1, 3, 8b, 9b, and 10b. These genes may include hypothetical or putative genes which are supported by mRNA or EST data.
In a further aspect, the present invention features diagnostic kits or apparatuses. In one embodiment, the kits or apparatuses of the present invention include one or more polynucleotides, each of which is capable of hybridizing under stringent or nucleic acid array hybridization conditions to an RNA transcript, or the complement thereof, of a gene selected from Tables 1, 3, 8b, 9b, and 10b. In another embodiment, the kits or apparatuses of the present invention include one or more antibodies, each of which specifically recognizes a polypeptide product of a gene selected from Tables 1, 3, 8b, 9b, and 10b.
Moreover, the present invention features electronic systems for carrying out the methods of the present invention. In one embodiment, a system of the present invention includes (1) an input device through which an expression profile of at least one AML or MDS disease gene in a bone marrow sample of a patient of interest is inputted to the system; (2) a storage medium which includes one or more reference expression profiles of the AML or MDS disease gene; and (3) a processor which executes a program to compare the expression profile of the patient of interest to the reference expression profile(s).
Other features, objects, and advantages of the present invention are apparent in the detailed description that follows. It should be understood, however, that the detailed description, while indicating preferred embodiments of the invention, are given by way of illustration only, not limitation. Various changes and modifications within the scope of the invention will become apparent to those skilled in the art from the detailed description.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The drawing is provided for illustration, not limitation.
Numerous AML or MDS disease genes are identified by the present invention. These genes are differentially expressed in bone marrow cells of patients who have AML or MDS compared to bone marrow cells of disease-free humans. These genes can be used as molecular markers for diagnosing or monitoring the progression or treatment of AML or MDS. These genes can also be used for the detection of early stages of progression from MDS to AML. In many embodiments, the methods of the present invention do not require positive selection of specific cell subtypes (such as CD34+), thereby allowing for rapid diagnosis of AML or MDS.
The availability of the human genome sequence, together with new developments in technology, such as DNA microarrays and computational biology, allows systemic gene expression studies for various diseases. This invention employs the systematic gene expression analysis technique to identify genes that are differentially expressed in BMMCs of AML or MDS patients versus disease-free patients. In many embodiments, polynucleotide arrays, such as cDNA or oligonucleotide arrays, are used for detecting and/or comparing gene expression profiles. Polynucleotide arrays allow quantitative detection of expression profiles of a large number of genes at one time. Suitable polynucleotide arrays for this purpose include, but are not limited to, Genechip® microarrays from Affymetrix (Santa Clara, Calif.) and cDNA microarrays from Agilent Technologies (Palo Alto, Calif.).
Polynucleotides to be hybridized to microarrays can be labeled with one or more labeling moieties to allow for detection of hybridized polynucleotide complexes. The labeling moieties can include compositions that are detectable by spectroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. Exemplary labeling moieties include radioisotopes, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like. The polynucleotides to be hybridized to the microarrays can be either DNA or RNA.
Hybridization reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, polynucleotides derived from one sample, such as BMMCs from an AML or MDS patient or a disease-free human, are hybridized to the probes in a microarray. Signals detected after the formation of hybridization complexes correlate to the polynucleotide levels in the sample. In the differential hybridization format, polynucleotides derived from two biological samples, such as one from an AML or MDS patient and the other from a disease-free human, are labeled with different labeling moieties. A mixture of these differently labeled polynucleotides is added to a microarray. The microarray is then examined under conditions in which the emissions from the two different labels are individually detectable. In one embodiment, the fluorophores Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway N.J.) are used as the labeling moieties for the differential hybridization format.
Signals gathered from microarrays can be analyzed using commercially available software, such as those provide by Affymetrix or Agilent Technologies. Controls, such as for scan sensitivity, probe labeling and cDNA quantitation, can be included in the hybridization experiments. In many embodiments, the microarray expression signals are scaled and/or normalized before being further analyzed. For instance, the expression signals for each gene can be normalized to take into account variations in hybridization intensities when more than one array is used under similar test conditions. Signals for individual polynucleotide complex hybridization can also be normalized using the intensities derived from internal normalization controls contained on each array. In addition, genes with relatively consistent expression levels across the samples can be used to normalize the expression levels of other genes. In one embodiment, the expression levels are normalized across the samples such that the mean is zero and the standard deviation is one. In another embodiment, the expression data detected by the microarray are subject to a variation filter which excludes genes showing minimal or insignificant variation across the samples.
The gene expression profiles in AML or MDS BMMCs can be compared to the corresponding gene expression profiles in disease-free BMMCs. Genes that are differentially expressed in AML or MDS BMMCs compared to disease-free BMMCs are identified. By “differentially expressed,” it means that the average expression level of a gene in AML or MDS BMMCs has a statistically significant difference from that of disease-free BMMCs. In one embodiment, the average expression level of an AML (or MDS) disease gene in AML (or MDS) BMMCs is substantially higher or lower than that in disease-free BMMCs. In another embodiment, the average expression level of an AML (or MDS) disease gene in AML (or MDS) BMMCs is at least 1, 2, 3, 4, 5, 10, 20, or more folds higher or lower than that in disease-free BMMCs. In yet another embodiment, the p-value of a Student's t-test (e.g., two-tailed distribution, two sample unequal variance) for the difference in the average expression levels of an AML or MDS disease gene in AML or MDS BMMCs versus disease-free BMMCs is no more than 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001, or less.
In one embodiment, AML or MDS disease genes are identified by using clustering algorithms based on the microarray gene expression data. A clustering analysis can be either unsupervised or supervised. Examples of unsupervised cluster algorithms include, but are not limited to, self-organized maps (SOMs), principle component analysis, average linkage clustering, and hierarchical clustering. Examples of supervised cluster algorithms include, but are not limited to, nearest-neighbors test, support vector machines, and SPLASH. Under a supervised cluster analysis, the disease status of each sample is already known. Two-class or multi-class correlation metrics can be used.
In one example, a permutation test-based neighborhood analysis is used to analyze the microarray gene expression data for the identification and selection of AML or MDS disease genes. The algorithm for the neighborhood analysis is described in Golub et al., S
Under one form of the neighborhood analysis, the expression profile of each gene is represented by an expression vector g=(e1, e2, e3, . . . , en), where ei corresponds to the expression level of gene “g” in the ith sample. A class distinction is represented by an idealized expression pattern c=(c1, c2, c3, . . . , cn), where ci=1 or −1, depending on whether the ith sample is isolated from class 0 or class 1. Class 0 may consist of patients with a particular disease or diseases such as AML or MDS, and class 1 may represent disease-free humans.
The correlation of gene “g” to the class distinction can be calculated using a signal-to-noise score:
where x0(g) and x1(g) represent the means of the log of the expression level of gene “g” in class 0 and class 1, respectively, and sd0(g) and sd1(g) represent the standard deviation of the log of the expression of gene “g” in class 0 and class 1, respectively. A higher absolute value of a signal-to-noise score indicates that the gene is more highly expressed in one class than in the other. An unusually high density of genes within the neighborhoods of the class distinction, as compared to random patterns, suggests that many genes have expression patterns that are significantly correlated with the class distinction.
AML or MDS disease genes can be selected based on the neighborhood analysis. In one embodiment, the selected AML or MDS disease genes have top absolute P(g,c) values. In another embodiment, the selected AML (or MDS) disease genes include genes that are highly expressed in AML (or MDS) BMMCs, as well as genes that are highly expressed in disease-free BMMCs.
In still another embodiment, the selected AML or MDS disease genes are limited to those shown to be significantly correlated to the class distinction under a permutation test (e.g., above the 1%, 2%, 5%, or 10% significance level). As used herein, x % significance level means that x % of random neighborhoods contain as many genes as the real neighborhood around the class distinction.
The above-described methods can be readily adapted to the identification of genes whose expression profiles in bone marrow cells are correlated with different stages of disease progression, or different clinical responses to a therapeutic treatment. For instance, BMMC gene expression profiles of MDS patients who eventually progress to AML can be compared to BMMC gene expression profiles of MDS patients who do not progress to AML. Genes that are differentially expressed in these two classes of patients may be identified and used as molecular markers for the prediction of progression from MDS to AML. For another instance, AML or MDS patients can be grouped based on their different responses to a therapeutic treatment. The global gene expression analysis is employed to search for genes which are differentially expressed in one group of patients as compared to another group of patients. The genes thus identified can be used for the prognosis or prediction of clinical outcome of an AML or MDS patient of interest.
In one embodiment, HG-U95Av2 or HG-U95A genechips (manufactured by Affymetrix, Inc.) were used for the identification of AML or MDS disease genes. See Examples 1-4, infra. RNA transcripts were isolated from BMMCs of AML or MDS patients and disease-free humans. cRNA was prepared from the RNA transcripts using protocols according to the Affymetrix's Expression Analysis Technical Manuals and then hybridized to the genechip. Hybridization signals were collected for each oligonucleotide probe on a genechip. Signals for the oligonucleotide probes of the same qualifier were averaged. Qualifiers that produced different hybridization signals for AML or MDS samples relative to disease-free samples were identified.
Table 1 lists examples of qualifiers on HG-U95Av2 or HG-U95A genechips that showed different hybridization signals for AML samples compared to disease-free samples. Each qualifier represents multiple oligonucleotide probes, and each of these oligonucleotide probes is stably attached to a different respective region on the genechip. Each qualifier in Table 1 corresponds to at least one AML disease gene which is differentially expressed in AML BMMCs compared to disease-free BMMCs. At least one oligonucleotide probe of the qualifier can hybridize under nucleic acid array hybridization conditions to an RNA transcript of the corresponding AML disease gene.
Table 1 illustrates the ratio of the average expression level of each AML disease gene in AML BMMCs over that in disease-free BMMCs (“AML/Disease-Free”), and the ratio of the average expression level of each AML disease gene in MDS BMMCs over that in disease-free BMMCs (“MDS/Disease-Free”). Table 1 also provides the p-value of a Student's t-test (two-tailed distribution, two sample unequal variance) for the difference between the average expression levels of each AML disease gene in AML BMMCs versus disease-free BMMCs (“p value (AML vs Disease-Free)”). The p-value suggests the statistical significance of the difference observed between the average expression levels. Lesser p-values indicate more statistical significance for the observed difference.
Table 2 provides the cytogenetic band, gene title, and Unigene and Entrez accession numbers for each AML disease gene depicted in Table 1. The Entrez nucleotide sequence database collects sequences from a variety of sources, such as GenBank, RefSeq and PDB. The database is publicly accessible. The oligonucleotide probes of each qualifier may be derived from the sequence of the Entrez accession number that corresponds to the qualifier.
Table 3 lists examples of qualifiers on HG-U95Av2 or HG-U95A genechips that showed different hybridization signals for MDS samples compared to disease-free samples. Each qualifier in Table 3 corresponds to at least one MDS disease gene. At least one oligonucleotide of the qualifier can hybridize under nucleic acid array hybridization conditions to an RNA transcript of the corresponding MDS disease gene.
Table 3 also demonstrates the ratio of the average expression level of each MDS disease gene in MDS BMMCs over that in disease-free BMMCs (“MDS/Disease-Free”), and the ratio of the average expression level of the MDS disease gene in AML BMMCs over that in disease-free BMMCs (“AML/Disease-Free”). In addition, Table 3 provides the p-value of a Student's t-test (two-tailed distribution, two sample unequal variance) for the difference between the average expression levels of each MDS disease gene in MDS BMMCs versus disease-free BMMCs (“p value (MDS vs Disease-Free)”). Table 4 provides the cytogenetic band, gene title, and Unigene and Entrez accession numbers for each MDS disease gene of Table 3.
H. sapiens DMA, DMB, HLA-Z1,
Homo sapiens clone 23551 mRNA
Homo sapiens mRNA; cDNA
Homo sapiens isolate RP
The AML and MDS disease genes listed in Tables 1-4 were identified based on HG-U95Av2 and HG-U95A genechip annotation provided by Affymetrix. AML or MDS disease genes can also be identified based on the corresponding Unigene or Entrez accession numbers. In addition, AML or MDS disease genes can be determined by BLAST searching the oligonucleotide probes or target sequences of the corresponding qualifiers against a human genome sequence database. Human genome sequence databases suitable for this purpose include, but are not limited to, the Entrez human genome database at the National Center for Biotechnology Information (NCBI). The NCBI provides publicly accessible BLAST programs, such as “blastn,” for searching its sequence database. In one embodiment, the query sequence for the BLAST search is an unambiguous segment (i.e., without “n” residues) of the target sequence of a qualifier. Gene or genes that have substantial sequence identity with the unambiguous segment are identified. These genes may produce different hybridization signals on the qualifier for AML or MDS samples compared to disease-free samples.
The oligonucleotide probe sequences as well as the target sequence of each qualifier on HG-U95Av2 or HG-U95A genechips may be obtained from Affymetrix or from the sequence files maintained at Affymetrix website “www.affymetrix.com/support/technical/byproduct.affx?product=hgu95sequence.” The oligonucleotide probe sequences can be found in the sequence files “HG_U95Av2 Probe Sequences, FASTA” and “HG_U95A Probe Sequences, FASTA,” and the target sequences may be found in “HG_U95Av2 Target Sequences, FASTA” and “HG_U95A Target Sequences, FASTA.” All of these sequence files are incorporated herein by reference in their entireties.
The above-described methods can be readily adapted to the identification of disease genes associated with other blood or bone marrow diseases. These disease genes are differentially expressed in bone marrow or blood cells of patients who have the blood or bone marrow diseases as compared to disease-free humans. Blood or bone marrow diseases that are amenable to the present invention include, but are not limited to, acute lymphocytic leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's disease, non-Hodgkin's disease, and other types of leukemia and lymphoma.
The disease genes of the present invention can be used for the detection or diagnosis of AML or MDS. The disease genes of the present invention can also be used to monitor the treatment or progression of AML or MDS. The disease genes of the present invention can be used independently or in combination with other clinical criteria. In many embodiments, the methods of the present invention include comparing the expression profile of one or more AML or MDS disease genes in a bone marrow sample of a patient of interest to a reference expression profile of the same gene or genes. The difference or similarity in the expression profiles is suggestive of AML, MDS, or disease-free status of the patient of interest.
Numerous methods can be used for determining the expression profile of AML or MDS disease genes in a bone marrow sample. In many embodiments, the expression profile is determined by measuring the levels of RNA transcripts of the disease genes. Methods suitable for this purpose include, but are not limited to, RT-PCT, Northern Blot, in situ hybridization, slot-blotting, nuclease protection assay, and polynucleotide arrays. In many other embodiments, the expression profile is determined by detecting the levels of polypeptides encoded by the disease genes. Methods suitable for this purpose include, but are not limited to, immunoassays such as ELISA (enzyme-linked immunosorbent assay), RIA (radioimmunoassay), FACS (fluorescence-activated cell sorter), Western Blot, dot blot, immunohistochemistry, or antibody-based radioimaging. Other methods, such as high-throughput protein sequencing, two-dimensional SDS-polyacrylamide gel electrophoresis, or mass spectrometry, can also be used.
Examples of bone marrow samples suitable for the present invention include, but are not limited to, whole bone marrow samples, or bone marrow samples containing enriched or purified BMMCs or bone marrow leukocytes. Any method known in the art (e.g., aspiration or biopsy) may be used to collect bone marrow samples. Bone marrow samples containing enriched or purified BMMCs or bone marrow leukocytes can be prepared by Ficoll gradients or CPTs (cell purification tubes). By “enriched,” it means that the cell percentage of BMMCs or bone marrow leukocytes in the sample is higher than that in the original whole bone marrow. In many instances, the enriched or purified BMMCs are un-fractionated.
In one embodiment, quantitative RT-PCR (such as TaqMan, ABI) is used for detecting and comparing the expression profiles of AML or MDS disease genes in bone marrow samples. Quantitative RT-PCR involves reverse transcription (RT) of RNA to cDNA followed by relative quantitative PCR.
In PCR, the number of molecules of the amplified target DNA increases by a factor approaching two with every cycle of the reaction until some reagent becomes limiting. Thereafter, the rate of amplification becomes increasingly diminished until there is not an increase in the amplified target between cycles. If a graph is plotted on which the cycle number is on the X axis and the log of the concentration of the amplified target DNA is on the Y axis, a curved line of characteristic shape can be formed by connecting the plotted points. Beginning with the first cycle, the slope of the line is positive and constant. This is said to be the linear portion of the curve. After some reagent becomes limiting, the slope of the line begins to decrease and eventually becomes zero. At this point the concentration of the amplified target DNA becomes asymptotic to some fixed value. This is said to be the plateau portion of the curve.
The concentration of the target DNA in the linear portion of the PCR is proportional to the starting concentration of the target before the PCR is begun. By determining the concentration of the PCR products of the target DNA in PCR reactions that have completed the same number of cycles and are in their linear ranges, it is possible to determine the relative concentrations of the specific target sequence in the original DNA mixture. If the DNA mixtures are cDNAs synthesized from RNAs isolated from different tissues or cells, the relative abundances of the specific mRNA from which the target sequence was derived may be determined for the respective tissues or cells. This direct proportionality between the concentration of the PCR products and the relative mRNA abundances is true in the linear range portion of the PCR reaction.
The final concentration of the target DNA in the plateau portion of the curve is determined by the availability of reagents in the reaction mix and is independent of the original concentration of target DNA. Therefore, the sampling and quantifying of the amplified PCR products can be carried out when the PCR reactions are in the linear portion of their curves. In addition, relative concentrations of the amplifiable cDNAs can be normalized to some independent standard, which may be based on either internally existing RNA species or externally introduced RNA species. The abundance of a particular mRNA species may also be determined relative to the average abundance of all mRNA species in the sample.
In one example, the PCR amplification utilizes internal PCR standards that are approximately as abundant as the target. This strategy is effective if the products of the PCR amplifications are sampled during their linear phases. If the products are sampled when the reactions are approaching the plateau phase, then the less abundant product may become relatively over-represented. Comparisons of relative abundances made for many different RNA samples, such as is the case when examining RNA samples for differential expression, may become distorted in such a way as to make differences in relative abundances of RNAs appear less than they actually are. This can be improved if the internal standard is much more abundant than the target. If the internal standard is more abundant than the target, then direct linear comparisons may be made between RNA samples.
A problem inherent in clinical samples is that they are of variable quantity and/or quality. This problem can be overcome if the RT-PCR is performed as a relative quantitative RT-PCR with an internal standard in which the internal standard is an amplifiable cDNA fragment that is larger than the target cDNA fragment and in which the abundance of the mRNA encoding the internal standard is roughly 5-100 fold higher than the mRNA encoding the target. This assay measures relative abundance, not absolute abundance of the respective mRNA species.
In another example, the relative quantitative RT-PCR uses an external standard protocol. Under this protocol, the PCR products are sampled in the linear portion of their amplification curves. The number of PCR cycles that are optimal for sampling can be empirically determined for each target cDNA fragment. In addition, the reverse transcriptase products of each RNA population isolated from the various samples can be normalized for equal concentrations of amplifiable cDNAs. While empirical determination of the linear range of the amplification curve and normalization of cDNA preparations are tedious and time-consuming processes, the resulting RT-PCR assays may, in certain cases, be superior to those derived from a relative quantitative RT-PCR with an internal standard.
In another embodiment, nucleic acid arrays (including bead arrays) are used for detecting and comparing the expression patterns of AML or MDS disease genes in bone marrow samples. Construction of nucleic acid arrays is well known in the art. A nucleic acid array of the present invention can comprise at least 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250 or more different polynucleotide probes, each different probe capable of hybridizing to a different respective AML or MDS disease gene. Multiple probes for the same gene can be used on the same array. Probes for other disease genes can also be included in a nucleic acid array of the present invention. The probe density on a nucleic acid array of the present invention can be in any range. For instance, the density can be at least or no greater than 5, 10, 25, 50, 100, 200, 300, 400, 500, 1000, or more probes/cm2.
In yet another embodiment, nuclease protection assays are used to quantify RNAs derived from bone marrow samples. There are many different versions of nuclease protection assays. The common characteristic of these nuclease protection assays is that they involve hybridization of an antisense nucleic acid with the RNA to be quantified. The resulting hybrid double-stranded molecule is then digested with a nuclease which digests single-stranded nucleic acids more efficiently than double-stranded molecules. The amount of antisense nucleic acid that survives digestion is a measure of the amount of the target RNA species to be quantified. Examples of nuclease protection assays include, but are not limited to, the RNase protection assay manufactured by Ambion, Inc. (Austin, Tex.).
In a further embodiment, immunoassays, such as ELISA, are used to detect and compare the expression profiles of AML or MDS disease genes. In an exemplifying ELISA, antibodies capable of binding to the target proteins are immobilized onto a selected surface exhibiting protein affinity, such as wells in a polystyrene or polyvinylchloride microtiter plate. Then, samples to be tested are added to the wells. After binding and washing to remove non-specifically bound immunocomplexes, the bound antigen(s) can be detected. Detection can be achieved by the addition of a second antibody which is specific for the target proteins and is linked to a detectable label. Detection may also be achieved by the addition of a second antibody, followed by the addition of a third antibody that has binding affinity for the second antibody, with the third antibody being linked to a detectable label. Before being added to the microtiter plate, cells in the samples can be lysed and/or extracted to separate the target proteins from potentially interfering substances.
In another exemplifying ELISA, the samples suspected of containing the target proteins are immobilized onto the well surface and then contacted with the antibodies of the invention. After binding and washing to remove non-specifically bound immunocomplexes, the bound antigen is detected. Where the initial antibodies are linked to a detectable label, the immunocomplexes can be detected directly. The immunocomplexes can also be detected using a second antibody that has binding affinity for the first antibody, with the second antibody being linked to a detectable label.
Another exemplary ELISA involves the use of antibody competition in the detection. In this ELISA, the target proteins are immobilized on the well surface. The labeled antibodies are added to the well, allowed to bind to the target proteins, and detected by means of their labels. The amount of the target proteins in an unknown sample is then determined by mixing the sample with the labeled antibodies before or during incubation with coated wells. The presence of the target proteins in the unknown sample acts to reduce the amount of antibody available for binding to the well and thus reduces the ultimate signal.
Different ELISA formats can have certain features in common, such as coating, incubating or binding, washing to remove non-specifically bound species, and detecting the bound immunocomplexes. For instance, in coating a plate with either antigen or antibody, the wells of the plate can be incubated with a solution of the antigen or antibody, either overnight or for a specified period of hours. The wells of the plate are then washed to remove incompletely adsorbed material. Any remaining available surfaces of the wells are then “coated” with a nonspecific protein that is antigenically neutral with regard to the test samples. Examples of these nonspecific protein include bovine serum albumin (BSA), casein and solutions of milk powder. The coating allows for blocking of nonspecific adsorption sites on the immobilizing surface and thus reduces the background caused by nonspecific binding of antisera onto the surface.
In ELISAs, a secondary or tertiary detection means can also be used. After binding of a protein or antibody to the well, coating with a non-reactive material to reduce background, and washing to remove unbound material, the immobilizing surface is contacted with the control and/or clinical or biological sample to be tested under conditions effective to allow immunocomplex (antigen/antibody) formation. These conditions may include, for example, diluting the antigens and antibodies with solutions such as BSA, bovine gamma globulin (BGG) and phosphate buffered saline (PBS)/Tween and incubating the antibodies and antigens at room temperature for about 1 to 4 hours or at 4° C. overnight. Detection of the immunocomplex then requires a labeled secondary binding ligand or antibody, or a secondary binding ligand or antibody in conjunction with a labeled tertiary antibody or third binding ligand.
Following all incubation steps in an ELISA, the contacted surface can be washed so as to remove non-complexed material. For instance, the surface may be washed with a solution such as PBS/Tween, or borate buffer. Following the formation of specific immunocomplexes between the test sample and the originally bound material, and subsequent washing, the occurrence of the amount of immunocomplexes can be determined.
To provide a detecting means, the second or third antibody can have an associated label to allow detection. In one embodiment, the label is an enzyme that generates color development upon incubating with an appropriate chromogenic substrate. Thus, for example, one may contact and incubate the first or second immunocomplex with a urease, glucose oxidase, alkaline phosphatase or hydrogen peroxidase-conjugated antibody for a period of time and under conditions that favor the development of further immunocomplex formation (e.g., incubation for 2 hours at room temperature in a PBS-containing solution such as PBS-Tween).
After incubation with the labeled antibody, and subsequent to washing to remove unbound material, the amount of label is quantified, e.g., by incubation with a chromogenic substrate such as urea and bromocresol purple or 2,2′-azido-di-(3-ethyl)-benzthiazoline-6-sulfonic acid (ABTS) and H2O2, in the case of peroxidase as the enzyme label. Quantitation can be achieved by measuring the degree of color generation, e.g., using a spectrophotometer.
Another immunoassay format suitable for the present invention is RIA (radioimmunoassay). An exemplary RIA is based on the competition between radiolabeled-polypeptides and unlabeled polypeptides for binding to a limited quantity of antibodies. Suitable radiolabels include, but are not limited to, I125. In one embodiment, a fixed concentration of I125-labeled polypeptide is incubated with a series of dilution of an antibody specific to the polypeptide. When the unlabeled polypeptide is added to the system, the amount of the I125-polypeptide that binds to the antibody is decreased. A standard curve can therefore be constructed to represent the amount of antibody-bound I125 polypeptide as a function of the concentration of the unlabeled polypeptide. From this standard curve, the concentration of the polypeptide in unknown samples can be determined. Any RIA protocol known in the art may be used in the present invention.
Suitable antibodies for the present invention include, but are not limited to, polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, single chain antibodies, Fab fragments, or fragments produced by a Fab expression library. Neutralizing antibodies (i.e., those which inhibit dimer formation) can also be used. Methods for preparing antibodies are well known in the art. In many embodiments, the antibodies of the present invention can bind to the respective AML or MDS disease gene products or other desired antigens with a binding affinity constant Ka of at least 106 M−1, 107 M−1, or more.
The antibodies of this invention can be labeled with one or more detectable moieties to allow for detection of antibody-antigen complexes. The detectable moieties can include compositions detectable by spectroscopic, enzymatic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. Exemplary detectable moieties include, but are not limited to, radioisotopes, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like.
In still another embodiment, the expression profiles of AML or MDS disease genes are determined by measuring the biological activities of the polypeptides encoded by the disease genes. If a biological activity of a polypeptide is known, suitable in vitro assays can be developed to evaluate such an activity, thereby allowing the determination the amount of the polypeptide in a sample of interest.
The expression profile of AML or MDS disease genes in a sample of interest is compared to a reference expression profile. In many embodiments, the reference expression profile is an average expression profile of the AML or MDS disease genes in reference bone marrow samples. The reference bone marrow samples can be prepared from disease-free humans, or patients with known disease status. In many instances, the reference bone marrow samples are prepared by using the same or comparable method as is the sample of interest. In many other instances, the reference expression profile is obtained by using the same or comparable methodology as is the expression profile to be compared.
The similarity or difference between expression profiles can be determined by comparing each component in an expression profile to the corresponding component in another expression profile. An expression profile can be constructed based on, for example, the absolute or relative expression values of AML or MDS disease genes, the ratios between expression values of different AML or MDS disease genes, or other measures that are indicative of expression levels or patterns.
The similarity or difference between two corresponding components can be evaluated based on fold changes, absolute differences, or other suitable means. In one example, a component in an expression profile is a mean value, and the corresponding component in another expression profile falls within the standard deviation of the mean value. In such a case, the former expression profile may be considered similar to the latter expression profile with respect to that component. Other criteria, such as a multiple or fraction of the standard deviation or a certain degree of percentage increase or decrease (e.g., less than 10% change), may be used to measure similarity.
One or more AML or MDS disease genes can be used for the comparison of expression profiles. In many embodiments, at least 2, 3, 4, 6, 8, 10, 12, 14, 16, 18, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, or more AML or MDS disease genes are used for diagnosing or monitoring the progression or treatment of AML or MDS. In one example, at least 50% (e.g., at least 60%, 70%, 80%, 90%, or more) of the components in an expression profile are similar to the corresponding components in another expression profile. Under these circumstances, the former expression profile may be considered similar to the latter expression profile. Different components in an expression profile may have different weights in the comparison. In certain cases, lower similarity requirements, such as less than 50% of the components, can be used to determine the similarities between expression profiles.
The AML or MDS disease genes, as well as the similarity criteria, can be selected such that the accuracy of diagnosis or prediction (the ratio of correct calls over the total of correct and incorrect calls) is relatively high. In many embodiments, the accuracy of diagnosis or prediction is at least 50%, 60%, 70%, 80%, 90%, or more. AML or MDS disease genes with diagnosis or prediction accuracy of less than 50% can also be used, provided that the diagnosis or prediction is statistically significant. In many cases, the gene expression-based methods are combined with other clinical tests to improve the accuracy of AML or MDS diagnosis.
Any AML or MDS disease gene can be used in diagnosing or monitoring the progression or treatment of AML or MDS. In one embodiment, the AML (or MDS) disease genes are selected to have p-value of no greater than 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001, or less. In another embodiment, the AML (or MDS) disease genes are selected to have significant correlations with the class distinction between AML samples (or MDS samples) and disease-free samples. For instance, the disease genes can be chosen from those above the 1%, 5%, or 10% significance level under the permutation test. The selected disease genes can include both AML and MDS disease genes.
In yet another embodiment, the selected AML (or MDS) disease genes include at least two groups of genes. The first group includes upregulated AML (or MDS) disease genes which have AML/Disease-Free ratios (or MDS/Disease-Free ratios) of at least 2, 3, 4, 5, 10, or more. The second group includes downregulated AML (or MDS) disease genes which have AML/Disease-Free ratios (MDS/Disease-Free ratios) of no greater than 0.5, 0.333, 0.25, 0.2, 0.1, or less. Each group may include at least 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, or more AML (or MDS) disease genes.
In a further embodiment, the gene set used in the present invention does not consist of genes selected from those described in Miyazato et al., supra, Tables 2 and 3 of Hofmann et al., supra, and Tables 3 and 4 and FIG. 1 of Larramendy et al., H
In still another embodiment, the AML or MDS disease genes are selected from Tables 1, 3, 8b, and 9b. In one example, the selected AML disease genes include at least one gene shared by both Tables 1 and 8b, and the selected MDS disease genes include at least one gene shared by both Tables 3 and 9b. Examples of AML disease genes that are listed in both Tables 1 and 8b include, but are not limited to, FLT3, SPINK2, KIAA0246 (STAB1), HOXB2, ACTA2, MIC2, H2AFO, PFKP, RUNX1, CMAH, ADA, SCHIP-1, OA48-18, MYB, TBXAS1, H2BFQ, BAX, RUNX1, SNL, UNK_AF014837 (M6A), ITGA4, UNK_AA149307 (FLJ21174), ACADM, DBP, H2BFC, LYL1, DKFZP586A0522, DCTD, ETS2, H2BFG, BAX, PRKACB, HSPCB, LYL1, H2BFD, UNK_U78027, MYB, H2AFO, KIAA0128, UNK_AA005018 (LOC51097), HSPB1, KIAA0620, SOX4, UNK_AJ223352 (H2B/S), APEX, P311, CSNK1A1, UNK_N53547 (MGC5508), POLD2, UNK_AB007960 (SH3GLB1), GNA15, H2BFH, ATIC, NAP1L1, CCNG1, NPIP, UNK_AA176780 (HSA249128), PLAGL1, PAGA (PRDX1), GPX1, COMT, FARSL, JWA, LGALS1, IFI16, KIAA0906, RANBP7, MYB, IDH1, HSPCB, DCTD, FARSL, ADFP, UNK_T75292 (FLJ10849), CALR, PPID, CCT3, C14ORF3, PTPN7, UNK_Y14391 (PGPL), UNK_AA056747 (ATP6A1), LIPA, ICAM2, BST2, TARDBP, P130 (NOLC1), H2BFE, SPN (DEAF1), AMD1, HRMT1L2, UNK_AI808712, UQCRC2, PIP5K2B, ADE2H1 (PAICS), IRF5, ACF7 (MACF1), GP36B (C5ORF8), TFAP4, ATP5B, LTC4S, H2BFK, M11S1, UNK_AF041080 (MN7), FABP5, CLECSF2, RPML3 (MRPL3), KIAA0594, NME2, CCT6A, UNK_AF026816 (ITPA), AKR1A1, CHC1, ACADVL, SNRPA, CNIL, UNK_D28423, ALCAM, UNK_AI819942, DBI, NDUFB5, UNK_AL031432, SNRPB2, P24B, UNK_AJ245416 (LSM2), RMP, OGT, CYC1 (HCS), UNK_W28944 (GRHPR), FNTA, DOC1 (CDK2AP1), NDUFS4, RPL22, LMO2, KIAA0546, NME1, IMPDH2, PBX3, SDHD, UNK_AJ224875 (MGC2840), TOMM70A, HINT, DKFZP564M2423 (PAI-RBP1), IRF5, TCF8, MNDA, CD83, KIAA0474 (RAP1GA1), LGALS3, PLXNC1, ALDH2, NS1-BP, S100A11, TRA@, UNK_W28281(GABARAPL1), KIAA0403 (PIP3-E), KIAA0513, CORO1A, NCF2, BST1, CXCR4, IL4R, TRB@, BTN2A1, PSG11, HSPA1B, PTPRE, CD8A, NCF4, PTPRE, HSPA1A, MYL2, UGCG, GYG, EGFL5, CA4, ELN, CSPG4, AMPD2, NCF4, KIAA0879 (ENPP4), NPM1P14, CST7, PDXK, MMPL1, FGR, HBA2, EPB72, UNK_U80114, IGL@, AZU1, TUBA1, UNK_D29810 (ESDN), CD19, C4.4A, SIM2, COL9A1, SH3BP5, RNASE3, NR4A2, UNK_AF015128 (IGHM), PIR121, UNK_AL050223 (VAMP2), RGS2, PDI2, MME, CDKN2D, KIAA0763, IGHA1 (IGHM), PSG11, SLC16A3, CPNE3, SLC2A3, CHIT1, BCL2A1, CDA, FCAR, CD3Z, UNK_AL022723, IGHA1, IGHA1, TRB@, UNK_U72507, TRB@, NCF1, IL8RA, KLRB1, IGKV1D-8 (IGKC), UNK_AI147237 (IGHG3), UNK_Y14768, CYBB, BN51T, FRAT2, ISG20, RAB31, QPCT, TUBA1, MGAM, PLAUR, IGHM, HP, IGHG3, IGHM, S100A8, BASP1, UNK_D84143 (IGL@), IGHM, PBEF, UNK_AF013512 (LBP), PPP2R4, ALOX5, ALOX5AP, CD79A, SCYA4, VNN2, UNK_W28504, BPI, CTSG, BASP1, UNK_AL031588, UNK_AI932613, LILRA3, UNK_H12458, PRTN3, NKG7, FCGR3A, KIAA0604 (ECE2), S100A9, P63 (CKAP4), ORM1, MS4A3, SLPI, IGL@, CTSG, CHI3L1, HK3, IGL@, GNLY, ELA2, DEFA3, FCN1, GAS11, CEACAM1, IL18RAP, ITGAM, S100P, MMP8, TFF3, OLR1, TCN1, CD24, ARG1, SCYC2, DEFA4, ANXA3, HPR, CEACAM6, TFF3, DEFA1, G0S2, CEACAM8, TCL1A, PGLYRP, GW112, UNK_U95626, MMP9, SGP28, S100A12, CAMP, and LCN2. Examples of MDS disease genes that are listed in both Tables 3 and 9b include, but are not limited to, HBG2, ID1, KIAA0246 (STAB1), 18SRNA5_Hs_AFFX, TNFRSF10B, H2BFQ, GATA2, QSCN6, H2BFE, DKFZP434C091, MIC2, UNK_AL050224 (PTRF), ANGPT1, PSG11, SLC16A3, MNDA, CPNE3, GRN, BPI, ANXA3, FCN1, D6S49E, PYGL, CEACAM1, CD24, UNK_AI147237, PPP2R4, IQGAP1, OLR1, CEACAM6, PDXK, NCF4, NCF4, GSN, UNK_AI932613, RNASE3, ITGAM, ORM1, PSG11, CTSG, ACTN1, IGLL3, NCF1, CTSG, TCN1, UNK_U95626, CORO1A, HPR, IL18RAP, FRAT2, MS4A3, GW112, SCYC2, CEACAM8, PRTN3, ELA2, CYBB, DEFA4, TFF3, SGP28, HK3, PGLYRP, TFF3, S100A12, CAMP, MMP9, TCL1A, and LCN2.
In another example, the selected AML disease genes include at least one gene which is in Table 8b but not Table 1, and the selected MDS disease genes include at least one gene which is in Table 9b but not Table 3. Examples of such AML disease genes include, but are not limited to, LGALS3BP, HOXA9, MT1A, FLT3, ITM2A, PROML1, DDX21, UNK_W28186, CCNA1, SPARC, TPS1, H2AFA, MN1, DF, DRAP1, BMI1, MRC1, TSC22, MEST, RNASE6, UNK_AL050224, ANGPT1, HSU37012, KRT18, FOXC1, CLIM1, UNK_AI743507, ID1, 121, MYC, TIMP1, GSTM4, LGALS2, UNK_D87002, HBG2, KIAA0125, TEGT, MOX2, GRO2, UNK_AF010313, ADA, CLU, PGDS, ETFB, LOC51035, CD34, SSBP2, UNK_U51712, PPP1R8, NFE2, CPA3, STIP1, EDN1, SNRPC, CALR, TNFRSF10B, GATA2, IGFBP2, CD34, ID1, TRIP7, TIF1B, C1NH, POLR2E, CCR2, TFP1, MTA1, GATA2, UNK_AL035494, ST3GALV1, AMD1, CAPN4, IARS, GNAI1, CTSW, MYB, MAFF, MT1F, UNK_AF063002, CDC4L, UNK_U79260, SFRS7, KIAA0015, FCER1A, AMD1, D123, UNK_AI816034, UNK_W25874, CAMK2G, HSF2, H1F2, D6S81E, ZYX, P23, TACTILE, SMARCA2, KIAA1097, TARS, AKR1C3, F13A1, NRGN, HOXB5, PSMA4, TRIP6, CCT8, OS4, CDK4, EIF4G1, UNK_AF052159, PDNP2, HOMER-3, UNK_U34994, UNK_AL049432, UNK_U79291, HNRPR, PHKB, MYB, PQBP1, AARS, GP110, ADPRT, CSNK1G2, ITGA7, SPC18, UBE2N, UNK_AB007916, H2BFR, ARHB, SFPQ, UNK_W26056, KIAA0233, NDUFV2, CLIC4L, TNP1, ODF1, DHCR7, UNK_AA846749, IER3, CD3E, KIAA0796, GIPR, DAPK2, GADD45B, LPO, NRG2, MSX1, HSF4, PMS2L11, RABGGTA, UNK_X90579, GRM4, ADTAB, UNK_AB029343, UNK_AA586695, UPK1A, SIAT4C, CEACAM3, TNFAIP3, PRG1, GDF1, UNK_AA883101, UNK_L27065, KIAA0751, PTGDS, TFF3, UNK_AF090102, LRP3, SEC14L1, HBB, UNK_L40385, TNNT1, TBCD, UNK_AL050065, UNK_H08175, GCL, MPP2, RHOK, UNK_W26214, MTHFR, KIAA1080, UNK_AJ224442, UNK_W29012, PRF1, UNK_U92818, UNK_X61755, 28SRNA3_Hs_AFFX, UNK_AI687419, UNK_X14675, ACVR1B, UP, GJB1, KRTHA5, CSH1, CYCL, UNK_AF035314, UNK_X72475, RB1, KIAA0061, UNK_M96936, TNXA, SLC22A6, HUMRTVLH3, GFPT2, UNK_W28907, UNK_AI817548, SMARCA4, RSN, CHN2, KIAA0895, UNK_AA151971, FETUB, FECH, PTPRN, GZMB, KIAA0320, FCGR3B, MUC3, KIAA0168, UNK_AF070633, UNK_M14087, CYP4F2, IGHD, and ABL1. Examples of such MDS disease genes include, but are not limited to, UNK_N55205, DDX21, HOXB2, FBN2, UNK_W28186, FBN2, UNK_W28186, PF4, HOXA9, EDN1, H2AFO, SPINK2, ID1, OA48-18, HYPA, BMI1, ETS2, PPBP, CPA3, CDC42, RHAG, H1F2, PPBP, HSPCB, H2BFG, H2BFC, UNK_AF041080, H2BFH, TSC22, SNL, FLT3, PPM1A, UNK_AF010313, TEGT, LYL1, PEA15, SOX4, UNK_AF070569, H2AFO, NFE2, UNK_AJ223352, DKFZP434N093, PAI2, ADFP, ACADM, UNK_AF041081, PROML1, ITM2A, H2BFD, CLU, CLECSF2, UNK_U51712, 18SRNAM_Hs_AFFXMAFF, UNK_W27675, NRIP1, TRIP6, PPM1A, UNK_S62138, ATP5B, TPD52L2, UNK_S62138, UBE2E3, NP, BTG3, KIAA0907, ITGAX, TSSC3, KIAA1096, UNK_AL049265, H2AFL, GPX1, UNK_AC004381, SOS1, KRAS2, PMP22, AMD1, GNA15, BACH1, IARS, C140RF3, HSPB1, GNB2L1, IDS, UNK_Z24724, H4FG, CD9, TARDBP, UNK_AL035494, ITGB1, KIAA1097, TYROBP, UNK_L40385, IGHA1, BCAT1, BACTIN5_Hs_AFFX, IL8RA, BN51T, CAPG, CSPG2, BTN2A1, IGHA1, KIAA0604, MCC, CYBA, NR4A2, PTPRN, UNK_AF013512, UNK_U72507, ECGF1, D5S346, GNLY, RHOK, UNK_X72475, UNK_AL031588, LILRA3, FGL2, IGKV1D-8, SDF2, UNK_X14675, IGL@, UNK_W28504, IGL@, UNK_AI126004, S100A9, CDA, MPO, DEFA3, RSN, FGL2, AZU1, F11, IGHG3, SCYA4, NKG7, OPHN1, D6S2245E, SLPI, KIAA1080, HP, ACVR1B, UNK_H08175, 28SRNA3_Hs_AFFX, KIAA0061, UNK_Z97632, DEFA1, NR2C1, UNK_M96936, DGCR6, KIAA0483, KIAA0372, UNK_W26214, UNK_AF035314, CYP4F2, SLC22A6, ATPASEP, UNK_W28907, POU1F1, CCNT2, KIAA0895, CHN2, KIAA0320, UNK_W27838, POU1F1, MUC3, FECH, UNK_AL096744, FETUB, SMARCA4, BRD1, UNK_AF070633, UNK_J04178, KIAA0168, UNK_M14087, and ABL1.
In addition to the genes depicted in Tables 1, 3, 8b, and 9b, the present invention contemplates detection of the expression profiles of other genes that can hybridize under stringent or nucleic acid array hybridization conditions to the qualifiers selected from Tables 1, 3, 8b, and 9b. These genes may include hypothetical or putative genes that are supported by EST or mRNA data. As used herein, a gene can hybridize to a qualifier if an RNA transcript of the gene can hybridize to at least one oligonucleotide probe of the qualifier. In many instances, an RNA transcript of the gene can hybridize under stringent or nucleic acid array hybridization conditions to at least 50%, 60%, 70%, 80%, 90% or 100% of the oligonucleotide probes of the qualifier.
“Stringent conditions” are at least as stringent as, for example, conditions G-L shown in Table 5. “Highly stringent conditions” are at least as stringent as conditions A-F shown in Table 5. As used in Table 5, hybridization is carried out under the hybridization conditions (Hybridization Temperature and Buffer) for about four hours, followed by two 20-minute washes under the corresponding wash conditions (Wash Temp. and Buffer).
1The hybrid length is that anticipated for the hybridized region(s) of the hybridizing polynucleotides. When hybridizing a polynucleotide to a target polynucleotide
HSSPE (1xSSPE is 0.15M NaCl, 10 mM NaH2PO4, and 1.25 mM EDTA, pH 7.4) can be substituted for SSC (1xSSC is 0.15M NaCl and 15 mM sodium citrate) in the hybridization and wash buffers.
TB* − TR*: The hybridization temperature for hybrids anticipated to be less than 50 base pairs in length should be 5-10° C. less than the melting temperature (Tm) of the hybrid,
In one embodiment, the selected AML or MDS disease genes include at least one gene capable of hybridizing under stringent or nucleic acid array hybridization conditions to a qualifier commonly shared by Tables 1 and 8b, or a qualifier commonly shared by Tables 3 and 9b. Examples of qualifiers listed in both Table 1 and 8b include 1065_at, 41071_at, 38487_at, 39610_at, 32755_at, 41138_at, 32609_at, 39175_at, 39421_at, 39317_at, 41654_at, 36536_at, 34397_at, 1475_s_at, 33777_at, 33352_at, 1997_s_at, 943_at, 39070_at, 32245_at, 35731_at, 32251_at, 37532_at, 40274_at, 35576_f_at, 39971_at, 38717_at, 630_at, 1519_at, 31522_f_at, 2067_f_at, 36215_at, 33986_r_at, 32096_at, 36347_f_at, 38213_at, 2042_s_at, 286_at, 38826_at, 34862_at, 36785_at, 38671_at, 33131_at, 32819_at, 2025_s_at, 39710_at, 40184_at, 39693_at, 1470_at, 39691_at, 40365_at, 31523_f_at, 38811_at, 40634_at, 1920_s_at, 33836_at, 40485_at, 36943_r_at, 41213_at, 37033_s_at, 34651_at, 1751_g_at, 39091_at, 33412_at, 1456_s_at, 41812_s_at, 35255_at, 1474_s_at, 39023_at, 1161_at, 631_g_at, 1750_at, 34378_at, 33173_g_at, 32543_at, 948_s_at, 40774_at, 40979_at, 39672_at, 41108_at, 34889_at, 38745_at, 38454_g_at, 39061_at, 32241_at, 36597_at, 31528_f_at, 35771_at, 263_g_at, 32825_at, 31801_at, 40854_at, 35741_at, 39056_at, 478_g_at, 38704_at, 36955_at, 39638_at, 41357_at, 39968_at, 31524_f_at, 39471_at, 40877_s_at, 39799_at, 40698_at, 37726_at, 41379_at, 33415_at, 38416_at, 35801_at, 38780_at, 1196_at, 38376_at, 40842_at, 32803_at, 351_f_at, 38642_at, 37774_at, 37692_at, 32232_at, 38072_at, 38399_at, 41163_at, 41375_at, 38011_at, 39507_at, 35818_at, 40133_s_at, 1499_at, 41535 at, 38695_at, 1151_at, 32184_at, 35184_at, 1521_at, 36624_at, 32696_at, 40467_at, 32051_at, 32853_at, 1009_at, 40441_g_at, 36465_at, 33439_at, 35012_at, 37536_at, 33080_s_at, 35367_at, 32193_at, 32747_at, 33752_at, 38138_at, 1106_s_at, 35785_at, 33333_at, 38735_at, 38976_at, 41038_at, 32675_at, 649_s_at, 404_at, 1105_s_at, 32673_at, 33758_f_at, 31692_at, 32916_at, 40699_at, 38895_i_at, 1150_at, 1104_s_at, 36640_at, 40215_at, 40876_at, 36488_at, 40739_at, 31621_s_at, 110_at, 38417_at, 38894_g_at, 36459_at, 32901_s_at, 34965_at, 35714_at, 35911_r_at, 1780_at, 31525_s_at, 40419_at, 34095_f_at, 35530_f_at, 33963_at, 330_s_at, 40227_at, 1096_g_at, 41641_at, 39609_at, 35379_at, 38968_at, 33979_at, 37623_at, 35566_f_at, 37579_at, 32254_at, 37701_at, 35674_at, 1389_at, 1797_at, 34832_s_at, 33499_s_at, 33757 f_at, 33143_s_at, 39706_at, 36979_at, 37061_at, 2002_s_at, 1117_at, 38868_at, 37078_at, 37420_i_at, 33501_r_at, 33500_i_at, 32793_at, 39245_at, 32794_g_at, 40159_r_at, 1353_g_at, 35449_at, 38194_s_at, 34105_f_at, 40729_s_at, 37975_at, 41694_at, 40171_at, 33304_at, 33371_s_at, 35966_at, 36591_at, 34509_at, 189_s_at, 41164_at, 36983_f_at, 37864_s_at, 41165_g_at, 41096_at, 32606_at, 31315_at, 41166_at, 33849_at, 35013_at, 39128_r_at, 307_at, 37099_at, 38017_at, 36674_at, 34498_at, 36338_at, 37054_at, 37105_at, 32607_at, 39872_at, 41827_f_at, 35094_f_at, 2090_i_at, 37066_at, 37121_at, 37200_at, 35536_at, 41471_at, 32529_at, 35315_at, 32451_at, 32275_at, 33273_f_at, 679_at, 36197_at, 36372_at, 33274_f_at, 37145_at, 37096_at, 31506_s_at, 36447_at, 36479_at, 988_at, 33093_at, 38533_s_at, 34319_at, 681_at, 37897_s_at, 37233_at, 35919_at, 266_s_at, 1962_at, 31495_at, 34546_at, 31792_at, 36984_f_at, 36105_at, 31477_at, 31793_at, 38326_at, 33530_at, 39318_at, 31381_at, 38615_at, 37149_s_at, 31859_at, 36464_at, 38879_at, 36710_at, and 32821_at. Examples of qualifiers listed in both Table 3 and 9b include 38585_at, 36617_at, 38487_at, AFFX-HUMRGE/M10098—5_at, 34892_at, 33352_at, 37194_at, 1257_s_at, 31528_f_at, 36713_at, 41138_at, 34320_at, 39315_at, 33758_f_at, 33143_s_at, 35012_at, 39706_at, 41198_at, 37054_at, 31792_at, 36447_at, 37967_at, 37215_at, 988_at, 266_s_at, 34105_f_at, 39128_r_at, 1825_at, 37233_at, 36105_at, 35714_at, 38894_g_at, 38895_i_at, 32612_at, 41827_f_at, 33979_at, 38533_s_at, 35315_at, 33757_f_at, 37105_at, 39330_s_at, 38514_at, 40159_r_at, 679_at, 35919_at, 37149_s_at, 38976_at, 36984_f_at, 33093_at, 40171_at, 32451_at, 38615_at, 31495_at, 33530_at, 37066_at, 37096_at, 37975_at, 34546_at, 31477_at, 36464_at, 36372_at, 31381_at, 37897_s_at, 38879_at, 36710_at, 31859_at, 39318_at, and 32821_at.
In another embodiment, the selected AML or MDS disease genes include at least one gene capable of hybridizing under stringent or nucleic acid array hybridization conditions to a qualifier which is shown in Table 8b but not in Table 1, or a qualifier which is shown in Table 9b but not in Table 3. Examples of qualifiers listed in Table 8b but not in Table 1 include 7754_at, 37809_at, 31623_f_at, 34583_at, 40775_at, 41470_at, 40490_at, 41188_at, 1914_at, 671_at, 32905_s_at, 35127_at, 37283 at, 40282_s_at, 39077_at, 41562_at, 36908_at, 39032_at, 37749_at, 34660_at, 34320_at, 39315_at, 33132_at, 35766_at, 41027_at, 36937_s_at, 40610_at, 36617_at, 37724_at, 1693_s_at, 39054_at, 37456_at, 754_s_at, 38585_at, 33528_at, 33989_f_at, 37716_at, 37187_at, 38097_at, 907_at, 36780_at, 35523_at, 36881_at, 37376_at, 38747_at, 32668_at, 39698_at, 37705_at 37179_at, 36749_at, 207_at, 1520_s_at, 38675_at, 1752_at, 34892_at, 203_at, 40422_at, 538_at, 36618_g_at, 37348_s_at, 33425_at, 39775_at, 41332_at, 39936_at, 40767_at, 1643_g_at, 37194_at, 40916_at, 39298_at, 262_at, 36138_at, 40827_at, 33809_at, 40718_at, 1472_g_at, 36711_at, 31622_f_at, 32542_at, 35371_at, 37242_at, 32165_at, 37384_at, 34023_at, 36684_at, 38123_at, 41322_s_at, 35182_f_at, 31670_s_at, 32087_at, 37018_at, 35292_at, 36958_at, 32548_at, 34961_at, 40962_s_at, 33219_at, 38473_at, 37399_at, 38052_at, 33925_at, 34251_at, 1449_at, 39341_at, 39767_at, 41202_s_at, 1942_s_at, 32844_at, 35342_at, 41123_s_at, 38233_at, 2012_s_at, 35848_at, 38443_at, 39792_at, 37392_at, 1476_s_at, 34325_at, 36185_at, 38808_at, 1287_at, 41725_at, 36892_at, 39139_at, 1660_at, 41243_at, 153_f_at, 1826_at, 40638 at, 34099_f_at, 37281_at, 34893_at, 33891_at, 39639_s_at, 36375_at, 39059_at, 37924_g_at, 1237_at, 36277_at, 38113_at, 35590_s_at, 34912_at, 39822_s_at, 34161_at, 35091_at, 214_at, 721_g_at, 179_at, 100_g_at, 38229_at, 35485_at, 32228_at, 37425_g_at, 34060_g_at, 36378_at, 36916_at, 32469_at, 595_at, 32227_at, 888_s_at, 39815_at, 1894_f_at, 38162_at, 38406_f_at, 37898_r_at, 39527_at, 31815_r_at, 36207_at, 31687_f_at, 2077_at, 36114_r_at, 39399_at, 34112_r_at, 41840_r_at, 37556_at, 34655_at, 31562_at, 31357_at, 32897_at, 40278_at, 40089_at, 32525_r_at, 32904_at, 32407_f_at, 416_s_at, AFFX-M27830—3_at, 32815_at, 1339_s_at, 34415_at, 37351_at, 39598_at, 34627_at, 725_i_at, 35955_at, 33021_at, 31586_f_at, 1937_at, 38513_at, 31578_at, 38508_s_at, 36237_at, 34702_f_at, 39640_at, 37434_at, 32162_r_at, 32579_at, 34350_at, 33244_at, 36548_at, 34703_f_at, 32620_at, 33914_r_at, 916_at, 37137_at, 39765_at, 31499_s_at, 732_f_at, 31666_f_at, 36071_at, 31574_i_at, 1350_at, 37467_at, and 2041_i_at. Example of qualifiers listed in Table 9b but not in Table 3 include 35920_at, 40490_at, 39610_at, 38012_at, 41188_at, 38012_at, 41188_at, 1115_at, 37809_at, 1520_s_at, 32609_at, 41071_at, 36618_g_at, 34397_at, 37508_f_at, 41562_at 1519_at 39209_r_at, 36749_at, 39736_at, 32663_at, 37018_at, 39208_i_at, 33986_r_at, 31522_f_at, 35576_f_at, 40877_s_at, 31523_f_at, 39032_at, 39070_at, 1065_at, 36501_at, 38097_at, 33989_f_at, 39971_at, 32260_at, 33131_at, 35224_at, 286_at, 37179_at, 32819_at, 35672_at, 37185_at, 34378_at, 37532_at, 40878_f_at, 41470_at, 40775_at, 36347_f_at, 36780_at, 40698_at, 39698_at, AFFX-HUMRGE/M10098_M_at, 31665_s_at, 40088_at, 39341_at, 857_at, 1842_at, 41357_at, 40076_at, 39420_at, 34850_at, 430_at, 37218_at, 33885_at, 36709_at, 31888_s_at, 32508_at, 35842_at, 34308_at, 37033_s_at, 40617_at, 32857_at, 1940_at, 38653_at, 262_at, 40365 at, 31895_at, 40827_at, 40979_at, 36785_at, 34610_at, 40815_g_at, 34857_at, 39969_at, 39389_at, 32241_at, 40916_at, 32808_at, 33219_at, 38363_at, 2077_at, 33501_r_at, 38201_at, AFFX-HSAC07/X00351—5_at, 1353_g_at, 41694_at, 38391_at, 38112_g_at, 32673_at, 33500_i_at, 35536_at, 1832_at, 35807_at, 37623_at, 916_at, 35013_at, 39245_at, 36879_at, 1252_at, 37145_at, 31562_at, 31586_f_at, 39872_at, 35094_f_at, 39591_s_at, 38194_s_at, 41627_at, 1339_s_at, 33274_f_at, 36338_at, 33273_f_at, 33150_at, 41471_at, 1117_at 33284_at, 31506_s_at, 34350_at, 39593_at, 33963_at, 35591_at, 37864_s_at, 36674_at, 37121_at, 39413_at, 41440_at, 32275_at, 40278_at, 36983_f_at, 34415_at, 41840_r_at, AFFX-M27830—3_at, 38513_at 38249_at, 31793_at, 1407_g_at, 31578_at, 40234_at, 35762_at, 40517_at, 31357_at, 33021_at, 1350_at, 36237_at, 38273_at, 37434_at, 34013_f_at, 32054_at, 36548_at, 33244_at, 39765_at, 33742_f_at, 34014_f_at, 732_f_at, 33914_r_at, 38908_s_at, 32620_at, 32579_at, 39894_f_at, 36071_at, 35418_at, 31666_f_at, 31574_i_at, and 2041_i_at.
In many embodiments, pattern recognition or comparison programs, such as the k-nearest-neighbors algorithm or the weighted voting algorithm, are employed for the comparison of expression profiles. In addition, the serial analysis of gene expression (SAGE) technology, the GEMTOOLS gene expression analysis program (Incyte Pharmaceuticals), the GeneCalling and Quantitative Expression Analysis technology (Curagen), or other suitable methods, programs or systems can be used to compare expression profiles.
The AML or MDS disease genes of the present invention can be used not only for diagnosing or monitoring the treatment or progression of AML or MDS, but also for predicting the progression from MDS to AML. As discussed below, more than 70% MDS patients who were determined to be AML using the gene expression-based analysis of the present invention eventually progressed to AML. Therefore, the AML or MDS disease genes of the present invention can be used as early indicators of AML progression in patients with MDS.
Algorithms, such as the weighted voting program, can be used for diagnosing or monitoring the treatment or progression of AML or MDS. The weighted voting algorithm is described in Golub et al., supra, and Slonim et al., supra, and can assign a patient of interest to one of two or more classes (e.g., AML versus disease-free, MDS versus disease-free, or AML versus MDS versus disease-free). Softwares capable of performing the weighted voting algorithm include, but are not limited to, the GeneCluster 2 software provided by MIT Center for Genome Research at Whitehead Institute.
Under one form of the algorithm, a patient of interest can be assigned to one of two classes (class 0 and class 1). In one example, class 0 includes disease-free humans, and class 1 includes MDS patients. In another example, class 0 includes disease-free humans, and class 1 includes AML patients. A set of MDS (or AML) disease genes can be selected to form a class predictor (classifier). Each gene in the class predictor casts a weighted vote for one of the two classes (class 0 and class 1). The vote of gene “g” can be defined as vg=ag(xg−bg), wherein ag equals to P(g,c) and reflects the correlation between the expression level of gene “g” and the class distinction between class 0 and class 1. bg equals to [x0(g)+x1(g)]/2, which is the average of the mean logs of the expression levels of gene “g” in class 0 and class 1. xg represents the normalized log of the expression level of gene “g” in the sample of interest. A positive vg indicates a vote for class 0, and a negative vg indicates a vote for class 1. V0 denotes the sum of all positive votes, and V1 denotes the absolute value of the sum of all negative votes. A prediction strength PS is defined as PS=(V0−V1)/(V0+V1).
Cross-validation can be used to evaluate the accuracy of a class predictor created under the weighted voting algorithm. In one embodiment, cross-validation includes withholding a sample which has been used in the neighborhood analysis for the identification of the disease genes. A class predictor is created based on the remaining samples, and then used to predict the class of the sample withheld. This process is repeated for each sample that has been used in the neighborhood analysis.
Class predictors with different MDS (or AML) disease genes can be evaluated by cross-validation. The best class predictor with the most accurate predication can be identified.
In one embodiment, a positive predication that a test sample belongs to class 0 or class 1 is made if the absolute value of PS for the test sample is no less than 0.3. Other PS threshold, such as no less than 0.1, 0.2, 0.4 or 0.5, may also be used to determine a sample's class membership.
In another embodiment, the AML (or MDS) disease genes in a class predictor are significantly correlated with the class distinction in neighborhood analysis. For instance, the disease genes can be selected from those above the 1%, 5%, or 10% significance level in neighborhood analysis. See Golub et al., supra, and Slonim et al., supra.
In yet another embodiment, a class predictor of the present invention includes top upregulated AML or MDS disease gene or genes, and/or top down-regulated AML or MDS disease gene or genes. A class predictor can include both AML and MDS disease genes. Two-class or multi-class correlation metrics can be used for the prediction of disease status.
In still another embodiment, a class predictor of the present invention includes n MDS (or AML) disease genes. A half of these MDS (or AML) disease genes have top P(g,c) scores, and the other half has top −P(g,c) scores. The number n is the only free parameter in defining the class predictor.
In a further embodiment, a class predictor of the present invention comprises or consists of at least 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 40, or more AML (or MDS) disease genes. The AML (or MDS) disease genes can include at least two groups of genes. The first group includes disease gene or genes having AML/Disease-Free ratios (or MDS/Disease-Free ratios) of at least 1.5, 2, 3, 4, 5, 10, or more. The second group includes disease gene or genes having AML/Disease-Free ratios (or MDS/Disease-Free ratios) of no greater than 0.667, 0.5, 0.333, 0.25, 0.2, 0.1, or less. In still another embodiment, each disease gene in a class predictor has a p-value of no greater than 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001 or less.
In many embodiments, a confidence threshold is established to optimize the accuracy of prediction and minimize the incidence of both false positive and false negative results. Average confidence scores collected for the accumulating pool of correctly diagnosed patients and correctly non-diagnosed disease-free individuals can be calculated, and a confidence threshold for a particular predictive gene set can be selected.
A clinical challenge concerning AML, MDS and other blood or bone marrow diseases is the highly variable response of patients to a therapy. The basic concept of pharmacogenomics is to understand a patient's genotype in relation to available treatment options and then individualize the most appropriate option for the patient. Different classes of patients can be created based on their different responses to a given therapy. Genes differentially expressed in one response class compared to another class may be identified using the global gene expression analysis. These genes are molecular markers for predicting whether a patient of interest will be more or less responsive to the therapy. For patients predicted to have a favorable outcome, efforts to minimized toxicity of the therapy may be considered, whereas for those predicted not to respond to the therapy, treatment with other therapies or experimental regimes can be explored.
In one embodiment, patients are grouped into at least two classes (class 0 and class 1). Class 0 includes patients who die within a specified period of time (such as one year) after initiation of a treatment. Class 1 includes patient who survive beyond the specified period of time after initiation of the treatment. Genes that are differentially expressed in class 0 compared to class 1 can be identified. These genes are prognostic markers of patient clinical outcome. Other clinical outcome criteria, such as time to progression, complete response, partial response, stable disease, or progressive disease, can also be used to group the patients and identify the respective prognosis genes.
The disease genes of the present invention can be used to monitor the progression or treatment of AML or MDS. For instance, the return of a disease gene to the normal expression level is indicative of the effectiveness of a treatment of the disease. The disease genes of the present invention can also be used to identify or test drugs for the treatment of AML or MDS. The ability of a drug candidate to reduce or abolish the abnormal expression of AML or MDS disease genes is suggestive of the effectiveness of the drug candidate in treating AML or MDS. Methods for screening or evaluating drug candidates are well known in the art. These methods can be carried out either in animal models or during human clinical trials.
The present invention contemplates expression vectors encoding AML or MDS disease genes. These AML or MDS disease genes may be under-expressed in AML or MDS tumor cells. By introducing the expression vectors into the patients in need thereof, abnormal expression of these genes may be corrected. Suitable expression vectors and gene delivery techniques are well known in the art.
In addition, this invention contemplates expression vectors encoding sequences that are antisense to AML and MDS disease genes. The AML or MDS disease genes may be over-expressed in AML or MDS tumor cells. By introducing the antisense expression vectors, abnormal expression of these disease genes can be corrected.
Expression of an AML or MDS disease gene can also be inhibited using RNA interference (“RNAi”). RNAi is a technique used in post transcriptional gene silencing (“PTGS”), in which the targeted gene activity is specifically abolished. RNA; resembles in many aspects PTGS in plants and has been detected in many invertebrates including trypanosome, hydra, planaria, nematode and fruit fly (Drosophila melanogaster). It may be involved in the modulation of transposable element mobilization and antiviral state formation. RNAi in mammalian systems is disclosed in PCT application WO00/63364. In one embodiment, dsRNA of at least about 21 nucleotides is introduced into cells to silence the expression of the target gene.
Antibodies against the polypeptides encoded by AML or MDS disease genes can be administered to patients in need thereof. In one embodiment, the antibodies can substantially reduce or inhibit the activity of a disease gene. For instance, the antibodies can reduce the activity of the disease gene by at least about 25%, 50%, 75%, 90%, or more.
A pharmaceutical composition comprising an antibody or an expression vector of the present invention can be prepared. The pharmaceutical composition can be formulated to be compatible with its intended route of administration. Examples of routes of administration include, but are not limited to, parenteral, intravenous, intradermal, subcutaneous, oral, inhalational, transdermal, topical, transmucosal, and rectal administration. Preparation of pharmaceutical compositions is well known in the art.
The present invention further features kits or apparatuses for diagnosing or monitoring the progression or treatment of AML or MDS. In one embodiment, the kits or apparatuses include one or more polynucleotides, each of which is capable of hybridizing under stringent conditions to a gene selected from Tables 1, 3, 8b, 9b, and 10b. The polynucleotides can be labeled with fluorescent, radioactive, or other detectable moieties. Any number of polynucleotides can be included in a kit. For instance, at least 2, 3, 4, 5, 10, 15, 20, or more polynucleotides can be included in a kit or apparatus, and each polynucleotide is capable of hybridizing under stringent conditions to a different respective gene selected from Tables 1, 3, 8b, 9b, and 10b. In one example, the polynucleotides are included in vials, tubes, bottles or other containing means. In another example, the polynucleotides are stably attached to one or more substrate supports. Nucleic acid hybridization can be directly carried out on the substrate supports.
In another embodiment, the kits or apparatuses include one or more antibodies specific for the polypeptides encoded by the genes selected from Tables 1, 3, 8b, 9b, and 10b. The antibodies can be labeled or unlabeled. Any number of antibodies can be included in a kit or apparatus. For instance, at least 2, 3, 4, 5, 10, 15, 20, or more antibodies can be included in a kit or apparatus, and each antibody can specifically recognize a different respective AML or MDS disease gene product. In one example, the kit or apparatus also includes other immunodetection reagents (such as secondary antibodies, controls or enzyme substrates). In another example, the antibodies in a kit of the present invention are included in one or more containers. In yet another example, the antibodies in an apparatus of the present invention are stably attached to one or more substrate supports. Suitable substrate supports include, but are not limited to, films, membranes, column matrices, or microtiter plate wells. Immunoassays can be performed directly on the substrate supports.
Furthermore, the present invention features systems capable of comparing an expression profile of interest to at least one reference expression profile. In many embodiments, the reference expression profiles are stored in a database. The comparison between the expression profile of interest and the reference expression profile(s) can be carried out electronically, such as by using a computer system. The computer system typically comprises a processor coupled to a memory which stores data representing the expression profiles to be compared. In one embodiment, the memory is readable as well as rewritable. The expression profiles can be retrieved or modified. The computer system includes one or more programs capable of causing the processor to compare the expression profiles. In one embodiment, the computer system includes a program capable of executing a weighted voting algorithm. In another embodiment, the computer system is coupled to a polynucleotide array from which hybridization signals can be directly fed into the computer system.
It should be understood that the above-described embodiments and the following examples are given by way of illustration, not limitation. Various changes and modifications within the scope of the present invention will become apparent to those skilled in the art from the present description.
BMMCs were isolated from bone marrow aspirates taken from 15 disease-free volunteers, 17 patients with MDS, and 18 patients with AML. Informed consents for the pharmacogenomic portions of these clinical studies were received and the project was approved by the local Institutional Review Boards at the participating clinical sites. MDS patients were primarily of Caucasian descent and had a mean age of 66 years (range of 52-84 years). AML patients were exclusively of Caucasian descent and had a mean age of 45 years (range of 19-65 years). Disease-free volunteers were exclusively of Caucasian descent with a mean age of 23 years (range of 18-32 years).
At screening, bone marrow aspirates from each patient were obtained for pharmacogenomic assessment and histopathologically examined by two independent pathologists. Each bone marrow sample was examined initially by an on-site pathologist and secondly by a single centralized pathologist who screened all samples in the present study and classified the aspirates accordingly. Inclusion criteria for AML patients included blasts in excess of 20% in the bone marrow, morphologic diagnosis of AML according to the FAB classification system and flow cytometry analysis indicating CD33+ status. Inclusion criteria for MDS patients included morphologic diagnosis of MDS and FAB classification as refractory anemia, refractory anemia with ringed sideroblasts, refractory anemia with excess blasts, or refractory anemia with excess blasts in transformation (where disease stability had been demonstrated for a minimum of 2 months).
BMMCs from individuals were isolated from whole bone marrow aspirates. All disease-free and diseased bone marrow samples were shipped or stored overnight prior to processing. Total RNA was isolated from BMMC pellets using the RNeasy mini kit (Qiagen, Valencia, Calif.). Labeled target for oligonucleotide arrays was prepared using a modification of the procedure described in Lockhart et al., N
2 μg total RNA is converted to cDNA by priming with an oligo-dT primer containing a T7 DNA polymerase promoter at the 5′ end. The cDNA is used as the template for in vitro transcription using a T7 DNA polymerase kit (Ambion, Woodlands, Tex.) and biotinylated CTP and UTP (Enzo). Labeled cRNA can be fragmented in 40 mM Tris-acetate pH 8.0, 100 mM KOAc, 30 mM MgOAc for 35 minutes at 94° C. in a final volume of 40 μl.
Individual diseased and disease-free samples are hybridized to HgU95Av2 or HG-U95A genechips (Affymetrix). No samples are pooled. 10 μg of labeled target can be diluted in 1×MES buffer with 100 μg/ml herring sperm DNA and 50 μg/ml acetylated BSA. To normalize arrays to each other and to estimate the sensitivity of the oligonucleotide arrays, in vitro synthesized transcripts of 11 bacterial genes can be included in each hybridization reaction as described in Hill et al., S
Labeled probes are denatured at 99° C. for 5 minutes and then 45° C. for 5 minutes and hybridized to oligonucleotide arrays comprised of over 12,500 human genes (HG-U95Av2 or HgU95A, Affymetrix). Arrays can be hybridized for 16 hours at 45° C. The hybridization buffer can include 100 mM MES, 1 M [Na+], 20 mM EDTA, and 0.01% Tween 20. After hybridization, the cartridges can be washed extensively with wash buffer 6×SSPET (e.g., three times at room temperature for at least 10 minutes each time). These hybridization and washing conditions are collectively referred to as “nucleic acid array hybridization conditions.” The washed cartridges can be subsequently stained with phycoerythrin coupled to streptavidin.
12×MES stock contains 1.22 M MES and 0.89 M [Na+]. For 1000 ml, the stock can be prepared by mixing 70.4 g MES free acid monohydrate, 193.3 g MES sodium salt and 800 ml of molecular biology grade water, and adjusting volume to 1000 ml. The pH should be between 6.5 and 6.7. 2× hybridization buffer can be prepared by mixing 8.3 ml of 12×MES stock, 17.7 ml of 5 M NaCl, 4.0 ml of 0.5 M EDTA, 0.1 ml of 10% Tween 20 and 19.9 ml of water. 6×SSPET contains 0.9 M NaCl, 60 mM NaH2PO4, 6 mM EDTA, pH 7.4, and 0.005% Triton X-100. In some cases, the wash buffer can be replaced with a more stringent wash buffer. 1000 ml stringent wash buffer can be prepared by mixing 83.3 ml of 12×MES stock, 5.2 ml of 5 M NaCl, 1.0 ml of 10% Tween 20 and 910.5 ml of water.
Data analysis and absent/present call determination was performed on raw fluorescent intensity values using GENECHIP 3.2 software (Affymetrix). The “average difference” values for each transcript were normalized to “frequency” values using the scaled frequency normalization method in which the average differences for 11 control cRNAs with known abundance spiked into each hybridization solution were used to generate a global calibration curve. See Hill et al., G
GENECHIP 3.2 software uses algorithms to calculate the likelihood as to whether a gene is “absent” or “present” as well as a specific hybridization intensity value or “average difference” for each transcript represented on the array. The algorithms used in these calculations are described in the Affymetrix GeneChip Analysis Suite User Guide.
Specific transcripts can be evaluated further if they meet the following criteria. First, genes that are designated “absent” by the GENECHIP 3.2 software in all samples are excluded from the analysis. Second, in comparisons of transcript levels between arrays, a gene is required to be present in at least one of the arrays. Third, for comparisons of transcript levels between groups, a Student's t-test is applied to identify a subset of transcripts that had a significant difference (p<0.05) in frequency values. In certain cases, a fourth criteria, which requires that average fold changes in frequency values across the statistically significant subset of genes be 2-fold or greater, can also be used.
Unsupervised hierarchical clustering of genes and/or arrays on the basis of similarity of their expression profiles was performed using the procedure described in Eisen et al., P
Expression profiles in various tissues can also be accessed and downloaded from the BioExpress database (GeneLogic, Gaithersburg Md.). GeneLogic GX2000 software based analysis tools including fold change analysis and electronic northerns can be utilized to calculate fold changes and distribution of expression values. Expression profiles for different samples can be exported using the expression analysis tool for further analysis in the hierarchical clustering package (Eisen et al., supra).
A k-nearest-neighbor's approach was used to perform a neighborhood analysis of real and randomly permuted data using a correlation metric (P(g,c)=μ1−μ2/σ1+σ2), where g is the expression vector of a gene, c is the class vector, ∥1 and σ1 define the mean expression level and standard deviation of the gene in class 1, respectively, and μ2 and σ2 define the mean expression level and standard deviation of the gene in class 2, respectively. The measures of correlation for the most statistically significant upregulated genes of the true defined classes (AML versus disease-free, or MDS versus disease-free) can be compared to the most statistically significant measures of correlation observed in randomly permuted class distinctions. The top 1%, 5% and median distance measurements of 100 randomly permuted classes compared to the observed distance measurements for AML (or MDS) and disease-free classes can be plotted to show the statistical verification of the AML (or MDS) disease genes identified by this invention.
Expression profiling analysis of the disease-free BMMC RNA samples, MDS BMMC RNA samples and AML BMMC RNA samples revealed that of the over 12,000 genes on HG-U95Av2 or HgU95A chips, at least 2,768 genes met an initial criteria for further analysis (i.e., at least 1 present call, and at least 1 frequency >10 ppm). Tables 1 and 2 list examples of the identified AML disease genes, and Tables 3 and 4 list examples of the identified MDS disease genes.
An initial unsupervised cluster analysis approach, which hierarchically clusters samples and genes based on a correlation coefficient (Eisen et al., supra), was performed using the 2,768 genes passing the initial filtering criteria (
A supervised approach was employed to identify transcripts whose expression levels were most highly correlated with BMMCs from disease-free, AML or MDS patients. To initially build and subsequently train the classifiers, 70% of the disease-free bone marrow expression patterns (n=10 out of 15), AML bone marrow expression pattern (n=12 out of 18) and MDS bone marrow expression patterns (n=6 out of 9 MDS patients who did not have conflicting diagnosis or progress to AML) were randomly selected and used as the training set. The remaining 30% samples were used as the test set. Genecluster's default correlation metric (Golub et al., supra) was used to identify genes with expression levels most highly correlated with the classification vector characteristic of the training set. All 2,768 genes meeting the initial filter criteria were screened using this approach. Predictor sets containing different numbers of genes were then evaluated by “leave one out cross validation” (LOOCV) to identify the predictor set with the highest accuracy for classification of the samples in the training set. Classifier sets containing top genes upregulated in AML BMMCs, top genes upregulated in MDS BMMCs, and top genes upregulated in disease-free BMMCs were prepared. Upregulation can be determined by fold changes.
The 93-gene classifier set is depicted in Tables 7a and 7b. The class within which each gene is upregulated is indicated (“Class Predicted”). Table 7b provides the cytogenetic band, the Unigene accession number, and the Entrez accession number for each of the 93 genes.
The 93-gene classifier was further evaluated by using the test set of samples. All samples in the test set were accurately predicted as disease-free, AML, or MDS, respectively (
Under a weighted voting method the expression level of each gene in the classifier set contributes to an overall prediction strength which determines the classification of the sample. The prediction strength can be measured as a combined variable that indicates the number of “votes” for either one class or another, and can vary between 0 (narrow margin of victory) and 1 (wide margin of victory) in favor of the predicted class. To quantitate the accuracy of this prediction method, a value (such as 0.3) can be imposed as the prediction strength threshold above which calls could confidently be made. In many cases, a prediction strength of less than 0.3 may also have evidentiary value in the prediction of disease status or progression.
The 93-gene classifier on BMMC profiles from MDS patients who ultimately progressed to AML was evaluated. In this study there were five patients histopathologically classified by both pathologists as MDS at the time of bone marrow sampling. These five MDS patients later progressed to AML (median time to progression=137 days). Unsupervised analyses (see
Expression profiles in bone marrow leukocytes from 31 AML patients, 13 MDS patients, and 18 disease-free volunteers were obtained and analyzed using procedures similar to those described in Examples 1-4. Bone marrow leukocytes can be purified from bone marrow aspirates using Ficoll-Hypaque gradients. 531 AML disease genes (Tables 8a and 8b) and 241 MDS disease genes (Tables 9a and 9b) were identified. The average expression level of each of these genes in AML or MDS bone marrow leukocytes is at least 2-fold higher or lower than that in disease-free bone marrow leukocytes. The p value of a Student's t-test (unequal variances) for the difference between the average expression levels of each of these disease genes in AML or MDS versus disease-free bone marrow cells is no more than 0.05. “COV” denotes coefficient of variance.
A similar approach was used to identify genes that are differentially expressed in AML bone marrow cells as compared to MDS bone marrow cells. The qualifiers thus identified are depicted in Table 10a, and the corresponding genes are illustrated in Table 10b. Like other AML or MDS disease genes identified in the present invention, the genes in Tables 10a and 10b can be used for the diagnosis of AML or MDS.
The foregoing description of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise one disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. Thus, it is noted that the scope of the invention is defined by the claims and their equivalents.
Homo sapiens m6A
Homo sapiens cDNA clone
Homo sapiens mRNA; cDNA
Homo sapiens mRNA for for
Homo sapiens clone 25036 mRNA
sapiens immunoglobulin lambda
Homo sapiens Pig8 (PIG8) mRNA,
sapiens cDNA clone
sapiens cDNA, mRNA sequence.
sapiens cDNA clone
Homo sapiens mRNA for putative
sapiens LIM protein SLIMMER
S. cerevisiae)-like 2
Homo sapiens mRNA; cDNA
Homo sapiens mRNA; cDNA
Homo sapiens D15F37
Homo sapiens putative oncogene
Homo sapiens cDNA
Homo sapiens clone 24416 mRNA
Homo sapiens mRNA; cDNA
Homo sapiens mRNA for G7b
Homo sapiens mRNA; cDNA
Homo sapiens mRNA for putative
Homo sapiens mRNA for G3a
H. sapiens DNA for cyp related
Homo sapiens HCR (a-helix
Homo sapiens cDNA
Homo sapiens cDNA clone
Homo sapiens mRNA; cDNA
Homo sapiens clone
Homo sapiens integrin alpha 6
Homo sapiens mRNA; cDNA
Homo sapiens clone IMAGE
Homo sapiens mRNA for putative
Homo sapiens c33.28 unnamed
Homo sapiens clone 23651 mRNA
Homo sapiens
H. sapiens mRNA
Homo sapiens clone H10
sapiens cystic fibrosis
Homo sapiens partial
Homo sapiens mRNA; cDNA
Homo sapiens
Homo sapiens cDNA clone
Homo sapiens isolate RP
Homo sapiens sequence
Homo sapiens
sapiens lipopolysaccharide binding
Homo sapiens clone 24672 mRNA
Homo sapiens cDNA clone
sapiens ccr2b (ccr2), ccr2a (ccr2),
Homo sapiens D15F37 pseudogene, S3
Homo sapiens Pig8 (PIG8) mRNA,
Homo sapiens clone 24659 mRNA
Homo sapiens mRNA for for histone
Homo sapiens D15F37 pseudogene, S4
Homo sapiens mRNA; cDNA
Homo sapiens mRNA; cDNA
Homo sapiens Chromosome 16 BAC
H. sapiens polyA site DNA
Homo sapiens integrin alpha 6 (ITGA6)
H. sapiens mRNA for rearranged Ig kappa
Homo sapiens isolate RP immunoglobulin
Homo sapiens clone IMAGE 25997
Homo sapiens cDNA, mRNA sequence.
Homo sapiens clone 23651 mRNA
Homo sapiens cDNA, mRNA sequence.
Homo sapiens mRNA; cDNA
Homo sapiens clone 24672 mRNA
Homo sapiens clone 25036 mRNA sequence
H. sapiens DMA, DMB, HLA-Z1, IPP2, LMP2,
Homo sapiens Chromosome 16 BAC clone
Homo sapiens mRNA; cDNA DKFZp434F152
The present application claims priority from and incorporates by reference the entire disclosure of U.S. Provisional Patent Application Ser. No. 60/466,055, filed Apr. 29, 2003.
Number | Date | Country | |
---|---|---|---|
60466055 | Apr 2003 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10834114 | Apr 2004 | US |
Child | 11789104 | Apr 2007 | US |