The disclosure pertains to methods and compositions for determining gene expression signatures for predicting survival in patients having a hematological malignancy and particularly leukemia patients such as AML patients.
Acute myeloid leukemia (AML) is a clonal disease, marked by the growth of abnormally differentiated immature myeloid cells, with a long term survival rate in adult patients of only 30%1, 2. The first explicit experimental evidence for the existence of leukemic stem cells (LSC), the only cell capable of initiating and sustaining the leukemic clonal disease, has been demonstrated3. Leukemia stem cells (LSCs) are a biologically distinct blast population positioned at the apex of the acute myeloid leukemia (AML) developmental hierarchy. A more complete understanding of the unique properties of LSCs is crucial for the identification of novel AML regulatory pathways and the subsequent development of innovative therapies that effectively target these cells in leukemia patients. Typically, studies overlook the heterogeneity of AML and the existence of LSC, potentially masking important molecular pathways.
While the cancer stem cell model was proposed over three decades ago, only recently has experimental evidence confirmed the hierarchical model for leukemia3. Using a quantitative assay for transplantation of primary AML into SCID or NOD/SCID mice, human AML cells that can initiate a human leukemic graft in mice (termed SCID Leukemia-Initiating Cells—SL-IC) were identified and prospectively purified3. The cells presenting with surface markers CD34+CD38−, representing from 0.1-1% of the AML cell population, were the only AML fraction capable of serially transplanting the leukemia. Additionally, this fraction could recapitulate the cellular diversity of the original leukemia, and therefore contained the LSC. The CD34+CD38+ fraction contained progenitor cells (cells capable of forming colonies but with limited self-renewal ability) while the other two fractions contain blast cells with no self-renewal capacity. Several groups have since used the NOD/SCID xenotransplant model to isolate rare cancer stem cell (CSC) in, for example, brain and breast tumours, indicating that the CSC model applies to multiple types of cancer4-6.
Since AML samples are more variable than normal hematopoietic cells it is essential to validate each sorted fraction. Incorrectly labeling a sorted AML fraction would severely compromise the ability to properly analyze the global gene expression data. Currently, the in vivo transplantation assay is the best technique to accurately detect LSCs. In vitro methods suffer from the alteration of marker expression and the inability to maintain LSC in culture. Importantly, a novel and improved in vivo SCID leukemia initiating cell assay to confirm the presence of LSC activity in each sorted fraction of 16 AML involving intrafemoral injection into NOD/SCID mice depleted of CD122 cells has been applied. With this assay, LSC were detected in the expected CD34+/CD38− population of sorted AML. However, in the majority of AML samples, LSC were detected in at least one additional fraction, demonstrating the critical importance of functional validation when interpreting global gene expression profiles of sorted stem cell populations19.
Significantly, while it is expected that HSC and LSC share similar regulatory pathways, a recent finding has highlighted differences between HSC and LSC regulatory networks7, 8. Deletion of the tumour suppressor gene Pten in murine hematopoietic cells resulted in the generation of transplantable leukemias. However, Pten deletion in HSCs lead to HSC depletion, indicating that, unlike LSCs, HSCs could not be maintained without Pten. Regulatory differences between HSC and LSC represent a vulnerability that can be used to specifically target LSCs for eradication, leaving HSCs unharmed. Greater understanding of both LSC and HSC regulation may reveal further differences between LSC and HSC control and lead to novel therapies.
Little is currently known of the expression profile of LSC enriched sub-populations in AML. Gal et al. examined the expression of CD34+/CD38− vs CD34+/CD38+ populations in 5 AML and identified 409 genes that are 2-fold over or under expressed between the cell populations9. However, the different cell populations were not functionally validated, and it is likely that the CD34+/CD38+ fractions also contain LSC, therefore the gene profile is cell marker dependent, not functionally dependent. Additionally, Majeti et al. identified 3005 differentially expressed genes in a comparison between AML CD34+/CD38− cells and normal bone marrow CD34+/CD38− cells. However, the analysis did not include mature cell populations, suggesting that the profile is a leukemia specific profile, not necessarily a stem cell profile10. The prognostic significance of these profiles was not explored.
AML is a genetically heterogeneous disease, with the karyotype of the AML blast as the most important prognostic factor11, 12. However, approximately half of all adult AML are cytogenetically normal at diagnosis. Within the cytogenetically normal AML (CN-AML) patient population, the mutational status of genes such as FLT3, NPM1, MN1 and CEBPA are associated with outcome; however, the association is not absolute and not all CN-AML present with such mutations, indicating that this class of AML is heterogeneous and additional factors are prognostically significant13, 14. Two groups have attempted to use gene expression profiling to predict outcome specifically in CN-AML patients. Bullinger et al. developed a signature that was validated by Radmacher et al., where there was a correlation with overall survival (p=0.001) of an classification rule developed using the previously identified signature15, 16. Metzeler et al. used an cohort of 163 CN-AML to develop an 86 probe signature that predicts survival in CN-AML, with a significant prediction of overall survival in an independent set of 79 CN-AML (p=0.002)17. There was a correlation with FLT3ITD status for these signatures; however, the 86 probe signature maintained association with outcome, independent of FLT3ITD status, indicating that gene expression profiling can be of value for predicting prognosis, in addition to mutational status.
A method for determining a prognosis of a subject having a hematological cancer comprising:
a) determining a gene expression level for each of a set of genes selected from leukemia stem cell (LSC) signature genes listed in Tables 2, 6, and/or 12, hematopoietic stem cell (HSC) signature genes listed in Tables 4 and/or 14, and/or CE-HSC/LSC signature genes listed in Table 19, to obtain a subject expression profile of a sample obtained from the subject; and
b) classifying the subject as having a good prognosis or a poor prognosis based on the subject expression profile;
wherein a good prognosis predicts an increased likelihood of survival within a predetermined period after initial diagnosis and poor prognosis predicts a decreased likelihood of survival within the predetermined period after initial diagnosis.
A computer-implemented method for determining a prognosis of a subject having a hematological cancer comprising: obtaining a subject expression profile and classifying, on a computer, the subject as having a good prognosis or a poor prognosis based on the subject expression profile comprising measurements of expression levels of a set of genes in a sample from the subject, wherein the set of genes is selected from genes listed in Table 2, 4, 6, 12 and 14, comprises at least 2 genes; wherein a good prognosis predicts an increased likelihood of survival within a predetermined period after initial diagnosis, and wherein a poor prognosis predicts a decreased likelihood of survival within the predetermined period after initial diagnosis.
A method for monitoring a response to a cancer treatment in a subject having a hematological cancer, comprising:
a) collecting a first sample from the subject before the subject has received the cancer treatment;
b) collecting a subsequent sample from the subject after the subject has received the cancer treatment;
c) determining the gene expression levels of a set of genes selected from LSC signature genes and/or HSC signature genes in the first and the subsequent samples according to a method described herein, to obtain a first sample subject expression profile and a subsequent sample subject expression profile, wherein the set of genes comprises at least 2 genes; and
d) calculating a first sample subject expression profile score and a subsequent sample subject expression profile score;
wherein a lower subsequent sample expression profile score compared to the first sample expression profile score is indicative of a positive response, and a higher subsequent sample expression profile score compared to the first expression profile score is indicative of a negative response.
A method of treating a subject having a hematological cancer, comprising determining a prognosis of the subject according to a method described herein, and providing a suitable cancer treatment to the subject in need thereof according to the prognosis determined.
Use of a prognosis determined according to a method described herein, and identifying a suitable treatment for treating a subject with a hematological cancer.
A composition comprising a set of nucleic acid molecules each comprising a polynucleotide probe sequence selected from SEQ ID NO:1-2533.
An array comprising for each gene in a set of genes, the set of genes comprising at least 2 of the genes listed in Table 2, 4, 6, 12 and/or 14, one or more polynucleotide probes complementary and hybridizable to a coding sequence in the gene, for determining a prognosis according to a method described herein.
A kit for determining prognosis in a subject having a hematological cancer according to the method described herein comprising:
a) an array or composition described herein;
b) a kit control; and
c) optionally instructions for use.
A computer system comprising:
a) a database including records comprising reference expression profiles associated with clinical outcomes, each reference profile comprising the expression levels of a set of genes listed in Table 2, 4, 6, 12 and/or 14;
b) a user interface capable of receiving and/or inputting a selection of gene expression levels of a set of genes, the set comprising at least 2 genes listed in Table 2, 4, 6, 12 and/or 14 for use in comparing to the gene reference expression profiles in the database;
c) an output that displays a prediction of clinical prognosis according to the expression levels of the set of genes.
In an embodiment, the expression profile is used to calculate an subject risk score, wherein the subject is classified has having a good prognosis if the subject risk score is low and as having a poor prognosis if the expression profile is high.
Other features and advantages of the present disclosure will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples while indicating preferred embodiments of the disclosure are given by way of illustration only, since various changes and modifications within the spirit and scope of the disclosure will become apparent to those skilled in the art from this detailed description.
As used herein, “Leukemia stem cell (LSC) signature genes” or “leukemic stem cell (LSC) signature genes includes genes listed in Tables 2, 6, and/or 12 and genes detectable by the probesets listed in Tables 1, 5 and/or 18 which are preferentially expressed in leukemic stem cells functionally defined.
As used herein, “LSC signature probe sets” as used herein refers to probesets listed for example in Tables 1, 5 and/or 18, each probeset comprising a set of probes, for example 11 probes that can be used to detect LSC signature genes.
As used herein, “Hematopoietic stem cell (HSC) signature genes” includes genes listed in Tables 4 and/or 14 and genes detectable by the probesets listed in Tables 3 and/or 17, which are preferentially expressed in hematopoietic stem cells functionally defined. Also included is the subset of HSC signature genes included in Table 20.
As used herein, “HSC signature probe sets” as used herein refers to the probesets listed for example in Tables 3 and/or 17, each probeset comprising a set of probes, for example 11 probes that can be used to detect HSC signature genes.
As used herein “core enriched HSC/LSC(CE-HSC/LSC) signature genes” refers to a subset of 44 HSC signature genes that are more highly expressed in LSC containing fractions (compared to non-LSC leukemic cells) and which are listed in Table 13 or Table 19, and which can for example detected using the corresponding probes and probesets listed for example in Tables 1, 3, 5, 17 and/or 18. These forty-four leading edge genes drive the GSEA enrichment of the HSC-R signature in the LSC gene expression data and represent HSC genes that are also differentially expressed in LSC.
As used herein “expression profile” refers to expression levels for a set of genes selected from LSC signature genes and/or HSC signature genes including for example CE-HSC/LSC signature genes. For example, an expression profile can comprise the quantitated relative expression levels of at least 2 or more genes listed in Table 2, 4 6, 12, 13, 14, 19 and/or 20 and/or genes detected by probes and probesets listed in Tables 1, 3, 5, 17 and/or 18.
A “subject expression profile” refers to the expression levels in (or corresponding to) a sample obtained from a subject. The gene expression levels can for example be used to prognose a clinical outcome based on similarity to a reference expression profile known to be associated with a particular outcome or used to calculate a subject risk score for comparison to a selected threshold.
The term “subject risk score” as used herein refers to a sum of the expression values of a set of genes selected from LSC signature genes and/or HSC signature genes (e.g. for example CE-HSC/LSC signature genes), which can be used to classify a subject. A subject risk score can be calculated for example by scaling (e.g. normalizing) each gene expression value detected for example with a probe or probeset, summing the expression values to obtain a risk score which can be compared to a reference value or standard (e.g. a threshold derived from subjects with a known outcome), where a subject risk score above the threshold predicts poor prognosis and below the threshold predicts good prognosis.
A “reference expression profile” or “reference profile” as used herein refers to the expression signature of a setset of genes (e.g. at least 2 genes LSC or HSC signature genes), associated with a clinical outcome in a patient having a hematological cancer such as a leukemia patient. The reference expression profile is identified using two or more reference patient expression profiles, wherein the expression profile is similar between reference patients with a similar outcome thereby defining an outcome class and is different to other reference expression profiles with a different outcome class. The reference expression profile is for example, a reference profile or reference signature of the expression of 2 or more, 3 or more, 4 or more or 5 or more genes listed in Table 2, 4, 6, 12, 13, 14, 19 and/or 20 and/or genes detectable with probes listed in Tables 1, 3, 5, 17 and/or 18 to which the expression levels of the corresponding genes in a patient sample are compared in methods for determining or predicting clinical outcome, e.g. good prognosis or poor prognosis. Similarly, a reference expression profile associated with good prognosis can be referred to a good prognosis reference profile and a reference expression profile associated with a poor prognosis can be referred to as a poor prognosis reference profile.
The term “classifying” as used herein refers to assigning, to a class or kind, an unclassified item. A “class” or “group” then being a grouping of items, based on one or more characteristics, attributes, properties, qualities, effects, parameters, etc., which they have in common, for the purpose of classifying them according to an established system or scheme. For example, subjects having increased expression of a set of genes selected from genes listed in Table 2, 4, 6, 12, 13, 14, 19 and/or 20 are predicted to have poor prognosis. The subject expression profile can for example be used to calculate a risk score to classify the subject, for example subjects having a summed expression value (e.g. subject risk score) above a selected threshold which can for example be the median score of a population of subjects having the same hematological cancer as the subject, can be classified as having a poor prognosis.
As used herein “prognosis” refers to an indication of the likelihood of a particular clinical outcome e.g. the resulting course of disease, for example, an indication of likelihood of survival or death due to disease within a fixed time period, and includes a “good prognosis” and a “poor prognosis”.
As used herein “outcome” or “clinical outcome” refers to the resulting course of disease and can be characterized for example by likelihood of survival or death due to disease within a fixed time period. For example a good clinical outcome includes cure, prevention of metastasis and/or survival for a fixed period of time, and a poor clinical outcome includes disease progression and/or death within a fixed period of time.
As used herein, “good prognosis” indicates that the subject is expected to survive within a set time period, for example five years of initial diagnosis of a hematological cancer such as leukemia. The set period of time varies with the disease type e.g. leukemia type and/or subtype. For example for AML, a good prognosis refers to a greater than 30%, greater than 40%, or greater than 50% chance of surviving more than 1 year, more than 2 years, more than 3 years, more than 4 years or more than 5 years after initial diagnosis. As another example, a good prognosis is used to mean an increased likelihood of survival within a predetermined time compared to a median outcome, for example the median outcome of a particular AML subtype.
As used herein, “poor prognosis” indicates that the subject is expected to die due to disease within a set time period, for example five years of initial diagnosis of a hematological cancer such as leukemia. The set period of time varies with the particular disease e.g. leukemia type and/or subtype. For example for AML, a poor prognosis refers to a less than 50%, less than 40%, or less than 30% chance of surviving greater than 1 year, greater than 2 years, greater than 3 years, greater than 4 years or greater than 5 years after initial diagnosis. As another example, a poor prognosis is used to mean a decreased likelihood of survival within a predetermined time compared for example to a median outcome, for example the median outcome of the particular hematological cancer. For example, the 5 year relative survival rates overall reported form 1999 to 2005 for ALL is 66.3% (90.9% in children under 5); for CLL is 78.8%, for AML 23.4% overall (60.2% in children under 15) and for CML 53.3% (http://www.leukemia-lymphoma.org/all_page?item_id=9346#_survival).
The term a “decreased likelihood of survival”, as used herein means an increased risk of shorter survival relative to for example the median outcome for the particular cancer. For example, increased expression of two or more genes in the gene signatures described herein can be prognostic of decreased likelihood of survival. The increased risk for example may be relative or absolute and may be expressed qualitatively or quantitatively. Examples of expressions of risk include but are not limited to, odds, probability, odds ratio, p-values, attributable risk, relative frequency, positive predictive value, negative predictive value, and relative risk.
The term an “increased likelihood of survival”, as used herein means an increased likelihood or risk of longer survival relative to a subject without the decreased expression levels. Examples of expressions of risk include but are not limited to, odds, probability, odds ratio, p-values, attributable risk, relative frequency, positive predictive value, negative predictive value, and relative risk.
As used herein “signature genes” refers to set of genes disclosed herein predicting clinical outcome in a hematological cancer subject and includes without limitation LSC-derived signature genes and/or HSC-derived signature genes as well as CE-HSC/LSC signature genes. For example, LSC signature genes includes the genes listed in Table 2, 6, and/or 12; HSC signature genes includes the genes listed in Table 4, 14 and/or 20 and CE-HSC/LSC signature genes includes genes listed in Tables 13 and 19. The gene sequences identified by accession number for example in Tables 2, 4, 6, 12, 13, 14 and 19 are herein incorporated by reference.
The term “expression level” of a gene as used herein refers to the measurable quantity of gene product produced by the gene in a sample of a patient wherein the gene product can be a transcriptional product or a translated transcriptional product. Accordingly the expression level can pertain to a nucleic acid gene product such as RNA or cDNA or a polypeptide. The expression level is derived from a subject/patient sample and/or a control sample, and can for example be detected de novo or correspond to a previous determination. The expression level can be determined or measured for example, using microarray methods, PCR methods, and/or antibody based methods, as is known to a person of skill in the art.
The term “determining an expression level” or “expression level is determined” as used in reference to a gene or (set of genes) means the application of an agent and/or method to a sample, for example a sample from the subject and/or a control sample, for ascertaining quantitatively, semi-quantitatively or qualitatively the amount of a gene expression product, for example the amount of polypeptide or mRNA. For example, a level of a gene expression can be determined by a number of methods including for example arrays and other hybridization based methods and/or PCR protocols where a probe or primer or primer set is used to ascertain the amount of nucleic acid of the gene. For example, an expression level of a gene can be determined using a probeset or one or more probes of the probeset, described herein for a particular gene. In addition more than one probeset where more than one exists, can be used to determine the expression level of the gene. Other examples include Nanostring® technology, serial analysis of gene expression (SAGE), RNA sequencing, RNase protection assays, and Northern Blot. The polypeptide level can be determined for example by immunoassay for example Western blot, flow cytometry, immunohistochemistry, ELISA, immunoprecipation and the like, where a gene or gene signature detection agent such as an antibody for example, a labeled antibody specifically binds the gene polypeptide product and permits for example relative or absolute ascertaining of the amount of polypeptide.
The term “hematological cancer” as used herein refers to cancers that affect blood and bone marrow, and include without limitation leukemia, lymphoma and multiple myeloma.
The term “CSC hematological cancer” as used herein refers to cancers that are sustained by a small population of stem-like, tumor-initiating cells
The term “leukemia” as used herein means any disease involving the progressive proliferation of abnormal leukocytes found in hemopoietic tissues, other organs and usually in the blood in increased numbers. For example, leukemia includes acute myeloid leukemia (AML), acute lymphocytic leukemia (ALL), chronic lymphocytic leukemia (CLL) and chronic myelogenous leukemia (CML) including cytogenetically normal and abnormal subtypes.
The term “lymphoma” as used herein means any disease involving the progressive proliferation of abnormal lymphoid cells. For example, lymphoma includes mantle cell lymphoma, Non-Hodgkin's lymphoma, and Hodgkin's lymphoma. Non-Hodgkin's lymphoma would include indolent and aggressive Non-Hodgkin's lymphoma. Aggressive Non-Hodgkin's lymphoma would include intermediate and high grade lymphoma. Indolent Non-Hodgkin's lymphoma would include low grade lymphomas.
The term “myeloma” and/or “multiple myeloma” as used herein means any tumor or cancer composed of cells derived from the hematopoietic tissues of the bone marrow. Multiple myeloma is also knows as MM and/or plasma cell myeloma.
The term “cytogenetically normal AML” or “CN-AML” as used herein means AML or an AML cell that is characterized by normal chromosome number and structure.
The term “FLT3ITD” as used herein refers to a Fms-like tyrosine kinase 3 (FLT3) molecule (e.g. gene or protein) that comprises an internal tandem duplication (ITD). FLT3 is a receptor tyrosine kinase expressed in primitive hematopoietic cells that has been implicated in the regulation of HSC. Mutation of FLT3 is a strong prognostic indicator in CN-AML associated with poor outcome.
The term “NPM1” as used herein, refers to Nucleophosmin, including for example the sequences identified in entrez gene id 4869, herein incorporated by reference.
As used herein “sample” refers to any patient sample, including but not limited to a fluid, cell or tissue sample that comprises cancer cells such as leukemia cells including blasts, which can be assayed for gene expression levels, particularly genes differentially expressed in stem cell enriched populations or non-stem cell enriched populations, either leukemic or normal. The sample includes for example a blood sample, a fractionated blood sample, a bone marrow sample, a biopsy, a frozen tissue sample, a fresh tissue specimen, a cell sample, and/or a paraffin embedded section, material from which RNA can be extracted in sufficient quantities and with adequate quality to permit measurement of relative mRNA levels, or material from which polypeptides can be extracted in sufficient quantities and with adequate quality to permit measurement of relative polypeptide levels.
The term “sequence identity” as used herein refers to the percentage of sequence identity between two or more polypeptide sequences or two or more nucleic acid sequences that have identity or a percent identity for example about 70% identity, 80% identity, 90% identity, 95% identity, 98% identity, 99% identity or higher identity or a specified region. To determine the percent identity of two or more amino acid sequences or of two or more nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second amino acid or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity=number of identical overlapping positions/total number of positions.times.100%). In one embodiment, the two sequences are the same length. The determination of percent identity between two sequences can also be accomplished using a mathematical algorithm. A preferred, non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. U.S.A. 87:2264-2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. U.S.A. 90:5873-5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403. BLAST nucleotide searches can be performed with the NBLAST nucleotide program parameters set, e.g., for score=100, wordlength=12 to obtain nucleotide sequences homologous to a nucleic acid molecules of the present application. BLAST protein searches can be performed with the XBLAST program parameters set, e.g., to score-50, word_length=3 to obtain amino acid sequences homologous to a protein molecule of the present invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., 1997, Nucleic Acids Res. 25:3389-3402. Alternatively, PSI-BLAST can be used to perform an iterated search which detects distant relationships between molecules (Id.). When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., of XBLAST and NBLAST) can be used (see, e.g., the NCBI website). The percent identity between two sequences can be determined using techniques similar to those described above, with or without allowing gaps. In calculating percent identity, typically only exact matches are counted.
The term “subject” also referred to as “patient” as used herein refers to any member of the animal kingdom, preferably a human being.
The term “control” as used herein refers to a sample and/or an expression level or numerical value and/or range (e.g. control range) for a LSC or HSC signature gene or group of LSC or HSC signature genes, including for example CE-HSC/LSC signature genes, corresponding to their expression level in such a sample from a subject or a population of subjects (e.g. control subjects) who are known as not having or having a hematological cancer and a particular outcome. In another example, a level of expression in a sample from a subject is compared to a level of expression in a control, wherein the control comprises a control sample or a numerical value derived from a sample, optionally the same sample type as the sample (e.g. both the sample and the control are white blood cell containing fractions), from a subject known as not having or having hematological cancer and a particular outcome. Where the control is a numerical value or range, the numerical value or range is a predetermined value or range that corresponds to a level of the expression or range of levels of the genes in a group of subjects known as having a hematological cancer and outcome (e.g. threshold or cutoff level; or control range).
The term “non-cancer control” as used herein refers to a sample and/or expression level or numerical value corresponding to the expression level in a sample from a subject or a population of subjects (e.g. non-cancer control subjects) who are known as not having a hematological cancer. Similarly a “cancer” as used herein refers to a sample and/or expression level or numerical value corresponding to the expression level in a sample from a subject or a population of subjects (e.g. cancer control subjects) who are known as having a hematological cancer and a particular outcome, e.g. the same hematological cancer as the subject sample being tested e.g. both leukemias.
The term “difference in the level” as used herein when referring to a subject gene expression level in comparison to a control or previous sample refers to a measurable difference in the level or quantity of a LSC or HSC signature gene expression level or set of gene expression levels, compared to the control or previous sample that is of sufficient magnitude to indicate the subject is in a different class from the control and/or previous sample, for example a significant difference or a statistically significant difference. A difference in the level can for example be compared by calculating a subject risk score and comparing to a threshold that is for example statistically associated with a particular prognosis. A difference in a gene expression level can also be detected if a ratio of the level in a test sample as compared with a control (or previous sample) is greater than 1 or less than 1. For example, a ratio of greater than 1.5, 1.7, 2, 3, 3, 5, 10, 12, 15, 20 or more or a ratio less than 0.5, 0.25, 0.1, 0.05 or more
The term “measuring” or “measurement” as used herein refers to assessing the presence, absence, quantity or amount (which can be an effective amount) of either a given substance within a clinical or subject-derived sample, including the derivation of qualitative or quantitative concentration levels of such substances, or otherwise evaluating the values or categorization of a subject's clinical parameters.
The term “set” as used herein in the context of “set of genes” means one or more, optionally 2 or more, 3 or more, 4 or more or 5 or more genes. The set can for example include genes listed in Tables 2, 4 6, 12, 13, 14, 19, and/or 20 and/or genes detected by probes listed in Tables 1, 3, 5, 17 and/or 18 or a subset thereof including any number between for example 1 and 121 genes.
The term “threshold” as used herein refers to a predetermined numerical value or range that corresponds to a level of gene expression or summed levels of gene expression level or range at which a subject is more likely to have a particular clinical outcome compared to a subject with a level of gene expression or summed level of gene expression below the threshold. The threshold can be selected according to a desired level of accuracy or specificity, for example the threshold can be a median level in a population, for example subjects with AML, or an average level in a population of subjects with known outcome, e.g. poor prognosis. The threshold or threshold can correspond to an average of the highest 50%, 40%, 30%, 20% or 10% expression levels in subjects with poor outcome.
The term “kit control” as used herein means a suitable assay control useful when determining an expression level of a LSC or HSC signature gene or set of genes. For kits for detecting RNA levels for example by hybridization, the kit control can comprise an oligonucleotide control, useful for example for detecting an internal control such as GAPDH for standardizing the amount of RNA in the sample and determining relative biomarker transcript levels. The kit can control can also include RNA from a cell line which can be used as a ‘baseline’ quality control in an assay, such as an array or PCR based method.
The term “hybridize” as used herein refers to the sequence-specific non-covalent binding interaction with a complementary nucleic acid. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, 6.0× sodium chloride/sodium citrate (SSC) at about 45° C., followed by a wash of 2.0×SSC at 50° C. may be employed. With respect to an array, appropriate stringency conditions can be found and have been described for commercial microarrays, such as those manufactured and/or distributed by Agilent Inc, Affymetrix Inc, Roche-Nimblegen Inc. and other entities.
The term “microarray” or “array” as used herein refers to an ordered set of probes fixed to a solid surface that permits analysis such as gene analysis of a set of genes. A DNA microarray refers to an ordered set of DNA fragments fixed to the solid surface. For example, the microarray can be a gene chip. Methods of detecting gene expression and determining gene expression levels using arrays are well known in the art. Such methods are optionally automated.
The term “isolated nucleic acid sequence” as used herein refers to a nucleic acid substantially free of cellular material or culture medium when produced by recombinant DNA techniques, or chemical precursors, or other chemicals when chemically synthesized.
The term “polynucleotide”, “nucleic acid” and/or “oligonucleotide” as used herein refers to a sequence of nucleotide or nucleoside monomers consisting of naturally occurring bases, sugars, and intersugar (backbone) linkages, and is intended to include DNA and RNA which can be either double stranded or single stranded, represent the sense or antisense strand.
The term “probe” as used herein refers to a nucleic acid molecule that comprises a sequence of nucleotides that will hybridize specifically to a target nucleic acid sequence e.g. a coding sequence of a gene listed herein including in Table 2, 4, 6, 12 and/or 14. For example the probe comprises at least 10 or more, 15 or more, 20 or more bases or nucleotides that are complementary and hybridize contiguous bases and/or nucleotides in the target nucleic acid sequence. The length of probe depends on the hybridization conditions and the sequences of the probe and nucleic acid target sequence and can for example be 10-20, 21-70, 71-100, 101-500 or more bases or nucleotides in length. For example, the probe can comprise a sequence provided herein, including those listed in any one of Tables 1, 3, 5, 17 or 18 (e.g. comprise any one of SEQ ID NO:s 1-2533). The probes can optionally be fixed to a solid support such as an array chip or a DNA microarray chip.
A person skilled in the art would recognize that “all or part of” of a particular probe or primer can be used as long as the portion is sufficient for example in the case a probe, to specifically hybridize to the intended target and in the case of a primer, sufficient to prime amplification of the intended template.
The term “probe set” as used herein refers to a set of probes that hybridize with the mRNA of a specific gene and identified by a probe set ID number, such as 209993_at, 206385_at and others as listed in Table 1, 3 5, 17 or 18. Each probe set comprises one or more probes, for example 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more probes.
The term “primer” as used herein refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used. A primer typically contains 15-25 or more nucleotides or any number in between, although it can contain less. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art.
The term “antibody” as used herein is intended to include monoclonal antibodies, polyclonal antibodies, and chimeric antibodies. The antibody may be from recombinant sources and/or produced in transgenic or non-transgenic animals. The term “antibody fragment” as used herein is intended to include Fab, Fab′, F(ab′)2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, and multimers thereof and bispecific antibody fragments. Antibodies can be fragmented using conventional techniques. For example, F(ab′)2 fragments can be generated by treating the antibody with pepsin. The resulting F(ab′)2 fragment can be treated to reduce disulfide bridges to produce Fab′ fragments. Papain digestion can lead to the formation of Fab fragments. Fab, Fab′ and F(ab′)2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, bispecific antibody fragments and other fragments can also be synthesized by recombinant techniques.
To produce polyclonal antibodies, animals can be injected once or repeatedly with an antigen representing a peptide fragment of the protein product corresponding to the nucleotide sequence of interest, alone or in conjunction with other proteins, potentially in combination with adjuvants designed to increase the immune response of the animal to this antigen or antigens in general. Polyclonal antibodies can then be harvested after variable lengths of time from the animal and subsequently utilized with or without additional purification. Such techniques are well known in the art.
To produce human monoclonal antibodies, antibody producing cells (lymphocytes) can be harvested from a human having cancer and fused with myeloma cells by standard somatic cell fusion procedures thus immortalizing these cells and yielding hybridoma cells. Such techniques are well known in the art, (e.g. the hybridoma technique originally developed by Kohler and Milstein (Nature 256:495-497 (1975)) as well as other techniques such as the human B-cell hybridoma technique (Kozbor et al., Immunol. Today 4:72 (1983)), the EBV-hybridoma technique to produce human monoclonal antibodies (Cole et al., Methods Enzymol, 121:140-67 (1986)), and screening of combinatorial antibody libraries (Huse et al., Science 246:1275 (1989)). Hybridoma cells can be screened immunochemically for production of antibodies specifically reactive with cancer cells and the monoclonal antibodies can be isolated.
Specific antibodies, or antibody fragments, reactive against particular target polypeptide gene product antigens (e.g. Table 2, 4, 6, or 14 polypeptide), can also be generated by screening expression libraries encoding immunoglobulin genes, or portions thereof, expressed in bacteria with cell surface components. For example, complete Fab fragments, VH regions and FV regions can be expressed in bacteria using phage expression libraries (See for example Ward et al., Nature 341:544-546 (1989); Huse et al., Science 246:1275-1281 (1989); and McCafferty et al., Nature 348:552-554 (1990)).
As used herein “a user interface device” or “user interfaced” refers to a hardware component or system of components that allows an individual to interact with a computer e.g. input data, or other electronic information system, and includes without limitation command line interfaces and graphical user interfaces.
The term “similar” in the context of a gene expression level as used herein refers to a subject gene expression level that falls within the range of levels associated with a particular class e.g. prognosis, for example associated with a particular disease outcome, such as likelihood of survival.
The term “most similar” in the context of a reference expression profile refers to a reference expression profile that shows the greatest number of identities and/or degree of changes with the subject expression profile.
The phrase “therapy”, treatment”, or “treatment regimen” as used herein, refers to an approach aimed at obtaining beneficial or desired results, including clinical results and includes medical procedures and applications including for example chemotherapy, pharmaceutical interventions, surgery, radiotherapy, bone marrow transplant, stem cell transplant and naturopathic interventions as well as test treatments for treating hematological cancers. Beneficial or desired clinical results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions, diminishment of extent of disease, stabilized (i.e. not worsening) state of disease, preventing spread of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, and remission (whether partial or total), whether detectable or undetectable. “Treatment” or “treatment regimen” can also mean prolonging survival as compared to expected survival if not receiving treatment.
Moreover, a “treatment” or “prevention” regime of a subject with a therapeutically effective amount of a compound of the present disclosure may consist of a single administration, or alternatively comprise a series of applications.
A “suitable treatment” as used herein refers to a treatment suitable according to the determined prognosis. For example, a suitable treatment for a subject with a poor prognosis can include a more aggressive treatment, for example, in the case of AML, this can include a bone marrow transplant.
As used herein, “screening a new drug candidate” refers to evaluating the ability of a new drug or therapeutic equivalent to target CSCs for example LSCs in a hematological cancer.
As used herein, the term “molecular risk status” refers to the presence or absence of molecular risk factors associated with prognosis. For example, a subject in a “high molecular risk (HMR) group” includes a subject having NPM1wt/FLT3wt or FLT3ITD positive CN AML which is associated with poor prognosis; and a subject in a “low molecular risk (LMR) group” includes a subject with NPM1mut/FLT3wt CN AML.
In understanding the scope of the present disclosure, the term “comprising” and its derivatives, as used herein, are intended to be open ended terms that specify the presence of the stated features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps. The foregoing also applies to words having similar meanings such as the terms, “including”, “having” and their derivatives. Finally, terms of degree such as “substantially”, “about” and “approximately” as used herein mean a reasonable amount of deviation of the modified term such that the end result is not significantly changed. These terms of degree should be construed as including a deviation of at least ±5% of the modified term if this deviation would not negate the meaning of the word it modifies.
The recitation of numerical ranges by endpoints herein includes all numbers and fractions subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, and 5). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term “about.” Further, it is to be understood that “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “about” means plus or minus 0.1 to 50%, 5-50%, or 10-40%, preferably 10-20%, more preferably 10% or 15%, of the number to which reference is being made.
It is demonstrated herein that a LSC gene expression profile comprising for example 25 probe sets (Table 1, SEQ ID NO:1-280) corresponding to 23 genes (Table 2), 48 probe sets (Table 5; SEQ ID NO:1-280 and 759-1011) corresponding to 42 genes (Table 6) as well as smaller and larger probe sets (see
It is also demonstrated herein that a HSC gene expression profile comprising 43 probe sets (Table 3; SEQ ID NO:281-758) corresponding to 39 genes (Table 4) were able to distinguish AML patients with a poor prognosis from patients with a good prognosis. It is also demonstrated herein that an HSC gene expression profile comprising 147 probesets (Table 3 and 17) and 121 genes (Table 14) could also distinguish AML patients with a poor prognosis from patients with a good prognosis. The forty-three HSC signature probesets were identified using an ANOVA test (FDR 0.01) and the 147 signature probesets were identified using an one-way ANOVA analysis using Tukey HSD post-hoc test and Benjamini-Hochberg multiple testing correction (FDR 0.05). Other gene marker sets and/or probes sets comprising other numbers of genes or probe sets are also predicted to be prognostic.
An aspect of the disclosure includes a method for determining prognosis of a subject having a hematological cancer, comprising:
In an embodiment, increased expression of the set of genes compared to a control (e.g. a subject or subjects with good prognosis) is indicative of a poor prognosis. In an embodiment, decreased expression compared to a control, in indicative of a good prognosis. In an embodiment, the gene expression levels is correlated with a prognosis by comparing to one or more reference profiles associated with a prognosis, wherein the prognosis associated with the reference expression profile most similar to the expression levels is the provided prognosis.
In an embodiment, the set of genes includes 2 or more genes described herein (e.g. listed in the Tables and/or detectable by a probe or probeset described herein).
An embodiment, includes a method for determining prognosis in a subject having a hematological cancer comprising:
As further described below, the subject can be classified by comparing the subject expression profile to one or more reference profiles associated with a prognosis and identifying the reference profile most similar to the subject expression profile thereby classifying the subject. In an embodiment, the subject is classifying by calculating a subject risk score and comparing the subject risk score to a threshold, wherein a subject risk score greater than the threshold classifies the subject as having a poor prognosis and a subject risk score less than the threshold classifies the subject as having a good prognosis. In an embodiment, the threshold is the median score associated with a population of subjects.
In an embodiment, the set of genes comprises at least 2 genes. As demonstrated in
Accordingly, an embodiment includes a method for determining prognosis in a subject having a hematological cancer comprising:
a) determining a gene expression level for each gene of a set of genes selected from Tables 2, 6, 12, 4, 14, 13 and/or 19 (e.g. LSC signature genes listed in Tables 2, 6, and/or 12 and/or hematopoietic stem cell (HSC) signature genes listed in Tables 4 and/or 14, and/or CE-HSC/LSC signature genes listed in Tables 13 or 19), to obtain a subject expression profile of a sample from the subject, wherein the set of genes comprises at least 2 genes; and
b) classifying the subject as having a good prognosis or a poor prognosis based on the subject expression profile;
wherein a good prognosis predicts an increased likelihood of survival within a predetermined period after initial diagnosis and poor prognosis predicts a decreased likelihood of survival within the predetermined period after initial diagnosis, compared optionally to a median outcome for the hematological cancer.
A further embodiment includes a method for determining prognosis in a subject having a hematological cancer comprising:
Table 12 comprises a list of the top 80 most predictive probesets and the genes detected by the probesets. Table 2 comprises 25 probesets that detect 23 genes and Table 6 comprises 48 probesets that detect 42 genes. The genes listed in Table 2 and 6 are also found in Table 12 and the genes listed in Table 2 are also found in Table 6. In an embodiment, the set of genes is selected from Table 6. In a further embodiment, the set of genes comprises the genes listed in Table 6.
Yet another embodiment includes a method for determining prognosis in a subject having a hematological cancer comprising:
Table 4 comprises 48 probesets, which detect 39 genes and Table 14 comprises 149 probesets that detect 121 genes. Table 20 includes a subset of HSC signature genes that were analyzed by qRT-PCR analysis. The genes listed in Table 20 are also found in Table 14. In an embodiment, the set of genes is selected from Table 20.
A further embodiment, includes a method for determining prognosis in a subject having a hematological cancer comprising:
Table 19 comprises a subset of HSC signature genes that are also expressed in LSC. Table 13 comprises a subset of the Table 19 genes. In an embodiment, the set of genes is selected from Table 13.
As mentioned, signatures comprising 2 genes can differentiate AML patients with poor and good survival. In an embodiment, at least one of the set of genes is ceroid-lipofuscinosis, neuronal 5 (CLN5) or neurofibromin 1 (NF1) In an embodiment, CLN5 is detected by one or mores of probe set ID: 214252_s_at. In an embodiment, NF1 is detected by one or more probes of probe set ID 212676_at.
Two genes overlap (RBPMS and FRMD4B) between the HSC and LSC signatures, or between the LSC and CE-HSC/LSC lists. In an embodiment, the set of genes comprises RBPMS and/or FRMD4B.
a and 14b, shown an analysis of enrichment of LSC (14A) or HSC (14B) signatures in the expression data for poor cytogenetic risk AML vs good cytogenetic risk AML.
Determination of prognosis, e.g. good prognosis or poor prognosis, involves in an embodiment, classifying a subject with a hematological cancer such as leukemia, based on the similarity of a subject's gene expression profile to a reference expression profile associated with a particular outcome. Accordingly, in an embodiment, the disclosure provides a method for classifying a subject having a hematological cancer as having a good prognosis or a poor prognosis, comprising:
A number of algorithms can be used to assess similarity. For example, a Naïve Bayes probabilistic model is trained on data. In order to stratify the class of a new patient (prognosis of survival/non-survival) the Naïve Bayes classifier combines this probabilistic model with a decision rule: assign the sample to the class (survival/non-survival)) that is most probable; this is known as the maximum a posteriori or MAP decision rule.
The similarity can also be assessed by determining if the similarity between a subject expression profile and a reference profile is above or below a predetermined threshold. For example, the expression profile can be summed to provide a subject risk score. If the score is above a selected or predetermined threshold, the subject has a poor prognosis and if below the threshold the subject has a good prognosis.
In an embodiment, the subject expression profile is used to calculate a subject risk score, wherein the subject is classified as having a good prognosis if the subject risk score is low and as having a poor prognosis if the subject risk score is high. For example, the gene expression of 5 or more genes of a LSC and/or HSC signature genes could be determined by microarray analysis wherein the microarray comprises probes and/or probe sets directed to for example the 5 or more of the LSC and/or HSC signature genes The microarray results could be scaled to a standard expression range, (e.g. for example as determined using the 160 AML patients described in the Examples). An expression score is calculated from the summed expression levels detected using the probe or probe sets (e.g. one or more of the probes or probe sets listed in Tables 1, 3, 5, 17 and/or 18, or one or more probe sets selected from SEQ ID NOs:1-2533 and compared to a reference score or threshold (e.g. such as the median expression score of the 160 AML samples form the initial dataset) to determine if the subject falls within the poor prognosis or the good prognosis category based on the expression profile. In an embodiment, an expression profile is used to calculate a subject risk score, wherein the subject is classified as having a good prognosis if the subject risk score is below for example, a median risk score or threshold and as having a poor prognosis if for example the subject risk score is above the median or threshold. In another embodiment, an expression score or subject risk score is calculated by: a) calculating the log 2 expression value of the LSC or HSC gene signature marker set for the sample; b) centering the log 2 expression value of step b) to a zero mean; c) taking the sum of the log 2 expression values.
The predetermined period can vary depending on the likelihood of a particular outcome. In another embodiment, the predetermined period is 1 year, 2 years, 3 years, 4 years or 5 years.
The reference profiles and thresholds can be pre-generated, for example the reference expression profiles can be comprised in a database or generated de novo.
In an embodiment, the methods are used to measure treatment response. For example, the group used to test the prognostic power of the gene expression signature profiles described herein were therapeutically treated. The expression profiles were obtained prior to treatment and outcome was determined after treatment. Accordingly, the methods can be used to predict treatment response wherein a subject expression profile associated with poor prognosis is indicative of an increased likelihood of a poor or no treatment response and a subject expression profile associated with a good prognosis is indicative of an increased likelihood of a treatment response compared to for example the median response for example, the median response for the leukemia. Therefore, in an aspect, the disclosure includes a method for monitoring a response to a cancer treatment in a subject having a hematological cancer, comprising:
In another aspect, the methods described herein are used to screen for a putative drug candidate for a hematological cancer. In an embodiment the method comprises: contacting a test population of cells with a test substance; determining a gene expression level for each gene of a set of genes selected from leukemia stem cell (LSC) signature genes listed in Tables 2, 6, and/or 12, hematopoietic stem cell (HSC) signature genes listed in Tables 4 and/or 14, and/or CE-HSC/LSC signature genes listed in Table 19, to obtain an expression profile for the test population of cells and comparing to a control population of cells; calculating an expression score for the test population of cells and the control population of cells wherein a decreased expression score in the test population of cells compared to the control population is indicative that the test substance is a putative drug candidate. In an embodiment, the test and control population of cells are hematological cancer cells.
In an embodiment, the set of genes comprises 2 or more of the genes listed in Table 2, 6, and/or 12 and the set of genes comprises 2 or more of the genes listed in Table 4 and/or 14. In another embodiment, the set of genes comprises 2 or more of the genes listed in Table 20. In another embodiment, the set of genes comprises 2 or more of the genes listed in Table 13 or Table 19.
In a further embodiment, the set of genes comprises at least at least 2-5, at least 6-10, at least 11-15, at least 16-20, at least 20-25, at least 26-30, at least 31-35, at least 36-40 or at least 41, at least 42 or at least 43, at least 41-45, at least 46-50, at least 51-55, at least 56-60, at least 61-65, at least 66-70, at least 71-75, at least 76-80, at least 81-85, at least 86-90, at least 91-95, at least 96-100, at least 101-105, at least 106 to 110, at least 111 to 115, at least 116 to 120 or 121 genes.
In an embodiment, the set of genes comprises the genes listed in Table 2, 4, 6, 12, 13, 14, 19 or 20. In an embodiment, the set of genes comprises the genes listed in Table 19. In another embodiment, the set of genes comprises the genes listed in Table 13.
In an embodiment, the set of genes does not include one or more of ABCB1, BAALC, ERG, MEIS1, and EVI1 (also known as MECOM).
In another embodiment, the gene expression levels are determined using probes and/or probe sets. In an embodiment, the probes and probe sets are selected from SEQ ID NOs: 1 to 2533.
In an embodiment, the gene expression levels are determined using at least 2-5, at least 6-10, at least 11-14, at least 15-19, at least 20-24, or 25 LSC probe sets listed in Table 1; and/or at least 2-5, at least 6-10, at least 11-15, at least 16-20, at least 21-25, at least 26-30, at least 31-35, at least 36-40, at least 41-45 at least 46-50, at least 51-55, at least 56-60, at least 61-65, at least 66-70, at least 71-75, least 81-85, at least 86-90, at least 91-95, at least 96-100, at least 101-105, at least 106-110, at least 111-115, at least 116-120, at least 121-125, at least 126-130, at least 131-135, at least 136-140, at least 141-145, or at least 146-147 probe sets. In an embodiment, combinations of probes and probes sets listed in different tables are used to determine the gene expression levels.
Successive addition of the most highly ranked, determined by p-value, probes demonstrated a correlation with overall survival (
In yet another embodiment, a method described herein also comprises obtaining a sample from the subject, e.g. for determining the expression level of the set of genes. The sample, in an embodiment, comprises a blood sample or a bone marrow sample. In an embodiment, the sample comprises fresh tissue, frozen tissue sample, a cell sample, or a formalin-fixed paraffin-embedded sample. In an embodiment, the sample is submerged in a RNA preservation solution, for example to allow for storage. In an embodiment, the sample is submerged in Trizol®. In an embodiment, the sample is stored as soon as possible at ultralow (for example, below −190° C.) temperatures. Storage conditions are designed to maximally retain mRNA integrity and preserve the original relative abundance of mRNA species, as determined by those skilled in the art. The sample in an embodiment is optionally processed, for example, to obtain an isolated RNA fraction and/or an isolated polypeptide fraction. The sample is in an embodiment, treated with a RNAse inhibitor to prevent RNA degradation.
In another embodiment, the sample is a fractionated blood sample or a fractionated bone marrow sample. In an embodiment, the sample is fractionated to increase the percentage of LSC and/or HSC. In an embodiment, the fraction is predominantly for example greater than 90% CD34+. In another embodiment, the fraction is predominantly, for example greater than 90% CD38−. In a further embodiment, the fraction is predominantly, for example greater than 90% CD34+ and CD38−.
Wherein the gene expression level being determined is a nucleic acid, the gene expression levels can be determined using a number of methods for example a microarray chip or PCR, optionally multiplex PCR, northern blotting, or other methods and techniques designed to produce quantitative or relative data for the levels of mRNA species corresponding to specified nucleotide sequences present in a sample. These methods are known in the art. In an embodiment, the gene expression level is determined using a microarray chip and/or PCR, optionally multiplex PCR.
Further, for example a person skilled in the art would be familiar with the necessary normalizations necessary for each technique.
The methods described can utilize probes or probe sets comprising or optionally consisting of a nucleic acid sequence listed in Tables 1, 3, 5, 17 and/or 18. In an embodiment, the gene expression level is determined by detecting mRNA expression using one or more probes and/or one or more probe sets listed in Tables 1, 3, 5, 17 and/or 18.
In an embodiment, the method comprises additionally considering known prognostic factors, such as molecular risk status. For example, the mutational status of FLT3ITD and NPM1 has been associated with risk status in AML subjects, with low molecular risk associated with NPM1mut FLT3ITD− and high molecular risk associated with FLT3ITD+ or NPM1wtFLT3ITD−. It is demonstrated herein that the gene signatures can further stratify for example molecular risk subjects to identify subjects with poor prognosis.
Accordingly, in an embodiment, the method further comprises determining the molecular risk status of the subject. In an embodiment, the molecular risk status is low molecular risk (LMR) or high molecular risk (HMR) according to NPM1 and/or FLT3ITD status, wherein the subject is identified as LMR if the subject comprises a mutant NPMI gene and is FLT3IT positive, and is identified as HMR if the subject has a wildtype NPMI gene and is FLT3ITD negative. In a further embodiment, the subject is LMR and optionally the set of genes comprises genes selected from LSC signature genes. In an embodiment, the subject is HMR and optionally the set of comprises genes selected from HSC signature genes.
In an embodiment, the methods described herein can be used for example to select subjects for a clinical trial.
In an embodiment, the methods described herein can be used to select suitable treatment. For example, subjects with poor prognosis e.g. a high risk of non-survival may be advantageously treated with specific therapeutic regimens. More accurate classification can reduce the number of patients identified as high risk. Further, more accurate classification allows for treatments to be tailored and for aggressive therapies with greater risks or side effects to be reserved for patients with poor outcome. For example, CN-AML patients are considered intermediate risk of poor prognosis. One therapeutic option for treating AML is transplant. Given the intermediate risk, one option available to a patient is transplant, particularly if there was a related donor. However, where only an unrelated donor is available, because of complications, a transplant may not be recommended or carry additional risks. An application of the methods and products described herein is to provide a test to aid a medical professional in making such a decision. For example, where a patient has an intermediate risk but is identified by the methods and products described herein as having an increased likelihood of a good outcome, such a patient may be reclassified in a more “favorable’ category such that a transplant might not be recommended. Similarly, if the methods and products identified the patient as having an increased likelihood of a poor prognosis, the patient may be reclassified in a more “unfavorable’ category suggesting that a transplant, even from unrelated donors might be indicated. Accordingly, a better prognostic prediction could assist in making treatment decisions.
Accordingly in another aspect, the disclosure includes a method further comprising the step of providing a cancer treatment to a subject consistent with the disease outcome prognosis. In an embodiment, the disclosure provides use of a prognosis determined according to the method described herein, and identifying a suitable treatment for treating a subject with a hematological cancer. An embodiment includes a method of treating a subject having a hematological cancer, comprising determining a prognosis of the subject according to a method described herein and providing a suitable cancer treatment to the subject in need thereof according to the prognosis determined.
In another embodiment, the method further comprises providing a cancer treatment for the subject consistent with the molecular risk group and disease outcome prognosis. In an embodiment the cancer treatment is a stem cell transplant.
In an embodiment, the cancer treatment comprises a stem cell transplant. In another embodiment, the cancer treatment comprises a bone marrow transplant, or other standard treatment, such as chemotherapy.
In addition to being able to differentiate AML patients according to prognosis, the HSC signature is expected to be able to differentiate patients with hematological cancers other than AML, particularly other leukemias, that like AML for example have an altered growth and differentiation block and/or hematological cancers that are CSC hematological cancers. For example, it is myeloid leukemias such as MDS (Myelodysplastic Syndrome) or MPD (myeloproliferative disease, including CML—chronic myeloid leukemia which is considered a stem cell disease.
In an embodiment, the hematological cancer is leukemia. In an embodiment, the leukemia is acute myeloid leukemia (AML). In an embodiment, the hematological cancer is cytogenetically normal. In another embodiment, the AML is cytogenetically normal AML (CN-AML). In a further embodiment, the AML is M1, M2, M4, M4eO, M5, M5a, M5b, or unclassified AML. In yet a further embodiment, the AML is MO, M6, M7 or M8 AML. In another embodiment, the leukemia is ALL, CLL or CML or a subtype thereof. In another embodiment, the hematological cancer is lymphoma. In a further embodiment, the hematological cancer is multiple myeloma.
The methods described herein can be implemented using a computer.
Another aspect of the disclosure includes a computer-implemented method for determining a prognosis of a subject having a hematological cancer comprising: classifying, on a computer, the subject as having a good prognosis or a poor prognosis based on a subject expression profile comprising measurements of expression levels of a set of genes in a sample from the subject, the set of genes selected from genes listed in Table 2, 4 6, 12, 13, 14, 19, and/or 20 and/or genes detected by probes listed in Tables 1, 3, 5, 17 and/or 18; wherein a good prognosis predicts increased likelihood of survival within a predetermined period after initial diagnosis, and wherein a poor prognosis predicts a decreased likelihood of survival within the predetermined period after initial diagnosis.
In an aspect, the disclosure provides a computer-implemented method for determining a prognosis of a subject having a hematological cancer comprising: classifying, on a computer, the subject as having a good prognosis or a poor prognosis based on an expression profile comprising measurements of expression levels of a set of genes selected from LSC signature genes or HSC signature genes in a sample from the subject; wherein a good prognosis predicts an increased likelihood of survival within a predetermined period after initial diagnosis, and wherein a poor prognosis predicts a decreased likelihood of survival within the predetermined period after initial diagnosis. In an embodiment, the set of genes comprises at least one gene of the LSC signature genes or the HSC signature genes.
The results or the results of a step are optionally displayed or outputted. Accordingly, in an embodiment, the method further comprises displaying or outputting a result of one of the steps to a user interface device, a computer readable storage medium, a monitor, or a computer that is part of a network.
Another aspect of the disclosure includes a computer product for implementing the methods described herein e.g. for predicting prognosis, selecting patients for a clinical trial, or selecting therapy.
A further aspect of the disclosure provides a non-transitory computer readable storage medium with an executable program stored thereon, wherein the program is for predicting outcome or prognosis in a subject having a hematological cancer, and wherein the program instructs a microprocessor to perform one or more of the steps of any of the methods described herein.
A computer system comprising:
An exemplary system is a computer system having for example: a central processing unit; a main non-transitory storage unit, for example, a hard disk drive, for storing software and data, the storage unit controlled by storage controller; a system memory, preferably high speed random-access memory (RAM), for storing system control programs, data, and application programs, for example for viewing and manipulating data, evaluating formulae for the purpose of providing a prognosis, comprising programs and data loaded from non-transitory storage unit; system memory may also include read-only memory (ROM); a user interface, comprising one or more input devices (e.g., keyboard) and a display or other output device; a network interface card for connecting to any wired or wireless communication network (e.g., a wide area network such as the Internet); a communication bus for interconnecting the aforementioned elements of the system; and a power source to power the aforementioned elements. Operation of computer is controlled primarily by operating system, which is executed by central processing unit. Operating system can be stored in system memory. In addition to an operating system, in a typical implementation system memory includes: a file system for controlling access to the various files and data structures used by the methods and computer products disclosed herein. The system memory can optionally include a coprocessor dedicated to carrying out mathematical operations.
Another aspect includes a computerized control system 10 for carrying out the methods of the disclosure.
In an embodiment, the computerized control system 10 comprises at least one processor and memory configured to provide:
A schematic representation of an embodiment of a computerized control system 10 is provided in
In an embodiment, the set of genes is selected from Tables 2, 4 6, 12, 13, 14, 19, and/or 20 and/or genes detected by probes listed in Tables 1, 3, 5, 17 and/or 18.
In an embodiment, the subject expression profile is compared to a reference expression profile by comparing a subject risk score to a selected threshold, wherein the subject risk score is calculated by summing the subject expression profile gene expression values, optionally the log 2 expression values, of the set of genes.
In an embodiment, the dataset is generated using an array probed with a sample obtained from the subject.
In an embodiment, the computerized control system controls and/or receives data from an imaging module 50. In an embodiment, the imaging module is a microarray scanner, which optionally detects dye fluorescence. In an embodiment, the imaging module is configured to collect the images and spot intensity signals. In an embodiment, the computerized control system 10 further comprises an image data processor for processing the image data.
In an embodiment, the analysis module 30 further determines a prognosis characteristic such as a hazard ratio or risk score.
In an embodiment, the computerized control system 10 further comprises a search module 40 for searching an expression reference databases 70 to identify and retrieve reference expression profiles associated with a prognosis.
In an embodiment, the computerized control system 10 further comprises a user interface 60 operable to receive one or more selection criteria, wherein the processor is further operable to configure the analysis module 30 to include the criteria received in the user interface 60. For example, the selection criteria can comprise a selected threshold.
A further aspect comprises a non-transitory computer-readable storage medium comprising an executable program stored thereon, wherein the program instructs a processor to perform the following steps for a plurality of gene expression levels: calculate a subject risk score; and determine a prognosis according to the subject risk score.
In an embodiment, the program further instructs the processor to determine a prognosis characteristic such as a hazard ratio.
In an embodiment, the program further instructs the processor to output a prognosis and/or a prognosis characteristic such as a hazard ratio.
In an embodiment, one or more of the user interface components can be integrated with one another in embodiments such as handheld computers.
In an embodiment, the computer system comprises a computer readable storage medium described herein.
In an embodiment, the computer system is for performing a method described herein.
An aspect provides a composition comprising a set of probes or primers for determining expression of a set of genes. In an embodiment, the composition comprises at least 2 nucleic acid molecules each comprising a polynucleotide probe sequence selected from Tables 1, 3, 5, 17 or 18 (SEQ ID NO:1-2533. In an embodiment, the composition comprises a set of nucleic acid molecules wherein the sequence of each molecule comprises a polynucleotide probe sequence selected from SEQ ID NO:1-2533.
Another aspect includes an array comprising, for each gene in a set of genes, the set of genes comprising at least 2 of the genes listed in Table 2, 4, 6, 12, 13, 14, 19 and/or 20, one or more polynucleotide probes complementary and hybridizable to a coding sequence in the gene.
In an embodiment, the composition or array comprises at least 3-22, at least 23-44, at least 45-66, at least 67-88, at least 89-110, at least 111-132, at least 133-154, at least 155-176, at least 177-198, at least 199-220, at least 221-242, at least 243-264, at least 265-286, at least 287-308, at least 309-330, at least 331-352, at least 353-374, at least 375-396, at least 397-418, at least 419-440, at least 441-462, at least 463-478 or more nucleic acid molecules each comprising a polynucleotide probe sequence selected from Tables 1, 3, 5, 17 and/or 18 (SEQ ID NOs:1-2533 In yet another embodiment, the composition comprises 2-2533, or any number there between, nucleic acid molecules comprising or consisting of a polynucleotide probe sequence listed in Tables 1, 3, 5, 17 and/or 18 (SEQ ID NOs:1-2533).
In yet another embodiment, the composition comprises at least 2 nucleic acid molecules each comprising a polynucleotide probe sequence selected from SEQ ID NO:1-280 and 759-1011.
In yet another embodiment, the composition comprises at least 2 nucleic acid molecules each comprising a polynucleotide probe sequence selected from SEQ ID NO:281-758 and 1012 to 2533.
In another embodiment, the composition or array comprises at least 3-22, at least 23-44, at least 45-66, at least 67-88, at least 89-110, at least 111-132, at least 133-154, at least 155-176, at least 177-198, at least 199-220, at least 221-242, at least 243-264, at least 265-280, at least 281-295, at least 296-310, at least 311-325, at least 326-340, at least 341-355, at least 356-380, at least 381-395, at least 396-410, at least 411-425, at least 426-440, at least 441-455, at least 456-470, at least 471-485, at least 486-500, at least 501-515, at least 516-532 or up to 533 nucleic acid molecules/probes. In an embodiment, the composition or array comprises any number of nucleic acid molecules/probes from 3 to 2533, or more.
In another embodiment, the composition comprises at least 2 nucleic acid molecules each comprising a polynucleotide sequence selected from the probes comprised in the probe set IDs listed in Table 16.
In an embodiment, the set of genes comprises at least 3-5, at least 6-10, at least 11-15, at least 16-20, at least 21-25 of the genes listed in Table 2 and/or at least 6-10, at least 11-15, at least 16-20, at least 21-25, at least 26-30, at least 31-35, or at least 36-39 of the genes listed in Table 4, at least 6-10, at least 11-15, at least 16-20, at least 21-25, at least 26-30, at least 31-35, or at least 36-39 or at least 41-43 of the genes listed in Table 6, at least at least 6-10, at least 11-15, at least 16-20, at least 21-25, at least 26-30, at least 31-35, at least 36-39, at least 41-45, 46-66, at least 67-80, of the genes listed in Table 12 and/or at least 6-10, at least 11-15, at least 16-20, at least 21-25, at least 26-30, at least 31-35, or at least 36-39, at least 41-45, 46-66, at least 67-88, at least 89-110, or at least 111-121 of the genes listed in Table 14.
The array can be a microarray designed for evaluation of the relative levels of mRNA species in a sample.
Another aspect of the disclosure provides a kit for determining prognosis in a subject having a hematological cancer comprising:
A further aspect of the disclosure includes a kit for determining prognosis in a subject having a hematological cancer comprising:
In an embodiment, the kit further comprises one or more specimen collectors and/or RNA preservation solution.
In an embodiment, the specimen collector comprises a sterile vial or tube suitable for receiving a biopsy or other sample. In an embodiment, the specimen collector comprises RNA preservation solution. In another embodiment, RNA preservation solution is added subsequent to the reception of sample. In another embodiment, the sample is frozen at ultralow (for example, below 190° C.) temperatures as soon as possible after collection.
In an embodiment the RNA preservation solution comprises one or more inhibitors of RNAse. In another embodiment, the RNA preservation solution comprises Trizol® or other reagents designed to improve stability of RNA.
In an embodiment, the kit comprises at least 3-22, at least 23-44, at least 45-66, at least 67-88, at least 89-110, at least 111-132, at least 133-154, at least 155-176, at least 177-198, at least 199-220, at least 221-242, at least 243-264, at least 265-286, at least 287-308, at least 309-330, at least 331-352, at least 353-374, at least 375-396, at least 397-418, at least 419-440, at least 441-462 or at least 463-473 and for example up to 2533 or any number between 1 and 2533, nucleic acid molecules, each comprising and/or corresponding to a polynucleotide probe sequence listed in Table 1, 3, 5, 17 and/or 18 (SEQ ID NO:1-2533.
Another aspect of the disclosure provides a kit determining prognosis in a subject having a hematological cancer comprising:
In an embodiment, the kit comprises a set of antibodies specific for polypeptides corresponding to at least 2, 3, 4, 5, 6, 7, 8, 9 or at least 10 of the genes listed in Table 2, 4, 6, 12 and/or 14. In another embodiment, the kit comprises a set of antibodies specific for polypeptides corresponding to at least 11-15, 16-20, 21-25, 26-30, 31-35, 36-40, 41-45 or more of the genes listed in Tables 2, 4, 6, 12 and/or 14.
In an embodiment, the antibody or probe is labeled. The label is preferably capable of producing, either directly or indirectly, a detectable signal. For example, the label may be radio-opaque or a radioisotope, such as 3H, 14C, 32P, 35S, 123I, 125I, 131I; a fluorescent (fluorophore) or chemiluminescent (chromophore) compound, such as fluorescein isothiocyanate, rhodamine or luciferin; an enzyme, such as alkaline phosphatase, beta-galactosidase or horseradish peroxidase; an imaging agent; or a metal ion.
In another embodiment, the detectable signal is detectable indirectly. A person skilled in the art will appreciate that a number of methods can be used to determine the amount of a polypeptide product of a gene described herein, including immunoassays such as flow cytometry, Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE, as well as immunocytochemistry or immunohistochemistry. For example, flow cytometry or other methods for detecting polypeptides, can be used for detecting surface protein expression levels.
The kit can comprise in an embodiment, one or more probes or one or more antibodies specific for a gene. In another embodiment, the set or probes or antibodies comprise probes or antibodies wherein each probe or antibody detects a different gene listed in Table 2, 4, 6, 12 or 14.
In an embodiment, the kit is used for a method described herein.
The following non-limiting examples are illustrative of the present disclosure:
Peripheral blood cells were collected from patients with newly diagnosed AML after obtaining informed consent according to procedures approved by the Research Ethics Board of the University Health Network. Individuals were diagnosed according to the standards of the French-American-British (FAB) classification. Cells from sixteen different samples representing 7 AML subtypes were investigated in the studies. Specifically, low density peripheral blood cells were collected from 16 AML patients representing 7 FAB subtypes (2 M1, 1 M2, 1 M4, 1 M4e, 1 M5, 4 M5a, 1 M5b, 5 unclassified) by density centrifugation over a Ficoll® gradient. Low-density mononuclear cells isolated from individuals with AML were frozen viably in FCS plus 10% (vol/vol) DMSO. For sorting of AML sub-populations, AML blasts were stained with anti-CD34-APC (Becton-Dickinson) and anti-CD38-PE (Becton-Dickinson) and were sorted using either a Dako Mo-Flo (Becton-Dickinson) cell sorter or a BD FACSAria (Becton-Dickinson). Purity of each subpopulation exceeded 95%. Fractionated cells were captured in 100% FCS and recovered by centrifugation. As a result, each AML patient sample was sorted into 4 subpopulations based upon CD34 and CD38 antibody staining and cells recovered for functional and gene expression analysis.
Transplantation of Sorted AML Cells into NOD/SCID Mice
NOD/SCID mice (Jackson Laboratory, Bar Harbor, Me.) were bred and maintained in microisolater cages. Twenty-four hours before transplantation, mice were irradiated with 2.75 to 3.45 Gy gamma irradiation from a 137Cs source. Sorted AML cells were counted and resuspended into 1-5% FCS in 1× phosphate buffered saline (PBS) pH 7.4 and injected directly into the right femur of each experimental animal. Six and a half to fifteen weeks post-transplant, mice were euthanized by cervical dislocation and hind leg bones removed and flushed with media to recover engrafted cells. Percent human AML engraftment was assessed by flow cytometry for human CD45+ staining cells (Lapidot et al., 1994).
mRNA Expression Array
mRNA was extracted using the Trizol® RNA preparation as recommended by the manufacturer (Invitrogen) and the RNA was amplified by Nugen amplification per manufacturer's instructions (NuGEN Technologies, Inc.). Probes were labeled and Affymetrix U133A (high-throughput) microarrays were run as per manufacturer's instructions. Signal was normalized by RMA followed by log 2-transformation. The LSC/primitive cell-related gene list was computed standard two-group differential expression comparison (Smyth's moderated t-test18, SCID Leukemia-Initiating Cells (SL-IC) fractions vs non-SL-IC fractions). Each probe set consists of, generally, eleven oligonucleotide probes complimentary to a corresponding gene sequence. These eleven probes are used together to measure the mRNA transcript levels of a gene sequence. Quality control measures were taken. For example, a sample was rejected as the array results obtained after measurement by Affymetrix standard techniques and prior to normalization was an outlier when compared to the other samples on a box-whisker plot.
Correlation with Overall Survival.
To assess the prognostic impact of the LSC/primitive cell related profile, the 25 probe sets that were most positively correlated with the SL-IC AML populations versus non-SL-IC populations were selected as the 25 LSC probe set signature (genes listed in Table 2; probes listed in Table 1). Publicly available overall survival and expression data was analyzed17. In short, the expression value of each probe was scaled to 0 for each probe across the 160 AML using the median value. For each AML, the expression values of the LSC probe set signature was summed for each of the 160 bone marrow AML samples. This summed value was used to divide the AML group into two equal sized populations of 80 AML each based upon above or below median expression of the summed value of the 25 LSC probe set signature. The overall survival of the two groups was examined using a Kaplan-Meier plot and log-rank (Mantel-Cox) test. Similarly, the correlation with survival and the 43 HSC probe set signature was determined in a similar way (genes listed in Table 4, probes listed in Table 3), except the 43 HSC probe sets were used instead of the 25 LSC probe sets.
The gene expression profile of sorted populations of AML cells enriched for SL-IC cells, the LSC cells detected in the xenotransplant assay, were analyzed and compared to those populations without SL-IC, and a LSC/primitive cell related profile (25 LSC probe set signature) was developed. When this profile was used to examine overall survival in a group of 160 AML patients, there was a significant correlation with poor overall survival. Similarly, there was an excellent correlation between a 43 HSC probe set signature and poor overall survival, even though there is only one overlapping probe set between the two independently generated stem cell/primitive cell-related lists. Additionally, the AML cells used in the generation of the 25 LSC probe set signature were peripheral blood samples and the 43 HSC probe set signature was derived from cord blood, while the 160 AML samples were bone marrow samples. This suggests that these two stem cell related profiles are robust and unique.
Other groups have developed prognostic signatures for CN-AML from gene expression data of bulk AML. This approach is unique in that it involves the generation of the gene set that is based upon SL-IC in sorted cells, a functional readout that is independent of patient outcome. Likewise, the HSC profile is based upon the SCID repopulating cell assay, not overall survival. However, these independent investigations into stem cell regulation have a similar correlation with patient outcome, indicating that a stem cell profile is relevant to leukemia, whether it is the 43 HSC probe set signature or the 25 LSC probe set signature.
The LSC signature and HSC signatures can be tested in additional leukemia patient sample sets, including sets of patient samples that contain cytogenetically abnormal AML, in order to further support the prognostic value of the signatures. For example, other blood cancers such as acute lymphoblastic leukemia, lymphomas, CML, and CLL can be tested.
The expression levels of subsets of the LSC signature genes and HSC signature genes, combinations of the genes in the LSC probe set signature and HSC probe set signature as well as shared genes such as the CE-HSC/LSC signature genes will be determined and assessed to identify and/or confirm the prognostic abilities of said gene sets according to the methods described in Example 1.
Similar to Example 1, using the sorting of patient AML samples, transplantation of sorted AML cells into NOD/SCID mice, mRNA expression array, and correlation with overall survival procedures a 43 gene signature marker set prognostic of outcome was identified (Table 6). The expression levels of the genes in the LSC gene signature were detected using 48 probe sets (Table 5). The 48 probe set LSC/primitive cell-related gene list was computed USING standard two-group differential expression comparison (Smyth's moderated t-test 18, SL-IC fractions vs non-SL-IC fractions). Benjamini and Hochberg multiple testing correction was performed to generate a list of 48 probe sets with a false discovery rate of 0.05.”
Evidence from experimental xenografts show that some solid tumours and leukemias are organized as cellular hierarchies sustained by cancer stem cells (CSC). Despite the promise of the CSC model, the relevance to human disease remains uncertain and improvements to prognosis and therapy have yet to be derived from CSC properties. Moreover, there are conflicting reports of whether tumours continue to adhere to a CSC model when enhanced xenograft assays are applied. Here it is demonstrated that 16 primary human acute myeloid leukemia (AML) samples, fractionated into 4 populations and subjected to sensitive in vivo leukemia stem cell (LSC) analysis, follow a CSC model of organization. Each fraction was subjected to gene expression analysis and a global LSC-specific signature was determined from functionally defined LSC. Similarly, using human cord blood, a hematopoietic stem cell (HSC) enriched gene signature was established. Bioinformatic analysis identified a core transcriptional program that LSC and HSC share, revealing the molecular machinery that underlies stemness properties. Both LSC and HSC signatures, when assessed against a large group of cytogenetically normal AML samples, showed prognostic significance independent of other factors. The data establishes that determinants of stemness influence clinical outcome of AML and more broadly they provide direct evidence for the clinical relevance of CSC.
The cancer stem cell (CSC) model posits that many cancers are organized hierarchically and sustained by a subpopulation of CSC at the apex that possess self renewal capacity1. This model has elicited considerable interest within the greater cancer community especially as data is accumulating showing the relative resistance of CSC to therapy2-7. A key implication of the model is that cure should be dependent upon eradication of CSC, consequently patient outcome is determined by CSC properties. The CSC paradigm is well supported by two lines of evidence derived from xenotransplant models: primary cancer cells capable of generating a tumour in vivo can be purified and distinguished from those cancer cells that lack this ability; and CSC can be serially transplanted providing evidence for self renewer. However, there is little progress in translating understanding of CSC biology to improved prognosis or treatment of human disease. Thus, the importance of CSC outside of xenotransplant models is unclear and their relevance to human disease is not firmly established.
The best evidence to substantiate the clinical significance of CSC would be robust demonstration of improved survival in patients treated with new CSC-targeted therapeutics. In the absence of treatment data, the prognostic relevance of CSC can be indirectly established by correlating patient survival outcomes with CSC-specific biological properties determined using state-of-the-art xenograft models. By extension, the CSC hypothesis predicts that the heterogeneous survival outcomes observed within uniformly treated patient cohorts may be reflective of variation in CSC properties among patients. Emerging evidence from leukemia samples lends support to this prediction as correlative studies have associated characteristics linked to stem cell properties with outcome, such as the ability to engraft mice or surface expression of LSC-linked markers8, 9. However, these studies are based upon an older xenograft model and only investigated single cohorts, nevertheless they establish the feasibility of this approach.
If CSC properties are relevant to human disease, it follows that the molecular machinery that governs the stem cell state must influence clinical outcome. However, little is currently known of the identity of the molecular regulators that govern CSC-specific properties. Experimental data shows that LSC possess stem cell functions common to all stem cells, including self renewal and the ability to produce differentiated, non-stem cell progeny1. Murine models have been successfully used to identify a small number of genes that regulate LSC function, including MEIS1 and BMI110, 11. Gene expression profiling provides an approach to define CSC-specific attributes on a genome-wide basis. Recently, a human breast CSC signature was generated from an expression analysis where CSC-enriched populations were obtained from xenografts and some pleural effusions and compared to normal mammary cells12. The expression of the breast CSC genes correlated with patient outcome for breast and other cancer types, although some have questioned to what degree this correlation derives from cancer-specific versus CSC-specific properties12-14. Clearly, more focused studies of global gene expression in well defined CSC and non-CSC populations from primary samples are needed to generate CSC specific signatures. Such studies should reveal the identity of important stem cell regulators and provide the basis to determine whether CSC-specific signatures correlate to clinical aspects of human disease.
The prospective isolation and subsequent functional and molecular analysis of CSC from a heterogeneous tumour population is often dependent on the distinctive expression of surface marker proteins. Historically, xenografts into SCID or NOD/SCID mice were used to confirm these early marker-dependant sorting strategies15, 16. However, a series of recent studies using either syngeneic murine cancer models or NOD/SCID mice with impaired residual innate immunity have cast doubt upon the reliability of NOD/SCID mice to accurately capture all cancer stem cell activityl17-20. For example, while previous studies observed that LSC can be prospectively isolated only from the CD34+/CD38− cell fraction of acute myeloid leukemia (AML), identical to normal HSC, an improved xenotransplant system has enabled the detection of LSC in previously non-tumourigenic populations15, 16, 18, 19. In a separate example, the use of optimized xenotransplant methods radically altered the apparent detectable frequency of CSC from 1 in 105 tumour cells to 1 in 4 tumour cells, a result that stands in stark contrast to other studies20-22. These studies suggest that some human cancers may not follow the CSC model and strongly demonstrate the requirement for a sensitive xenotransplant model to confirm or refute the existence of a CSC hierarchy in each human cancer. More importantly, sample to sample variation between cell surface marker expression and CSC activity establishes an important principle, that all experiments designed to investigate CSC properties in purified cell fractions must assess, at the same time, all cell fractions with well validated tumour- or leukemia-initiation assays (e.g. in regards to determining a LSC or HSC signature.
Here 16 AML and 3 cord blood primary samples were fractionated and a sensitive xenotransplant assay was utilized to detect and functionally quantify each fraction for cells with LSC or HSC activity, respectively. Leukemia stem cell (LSC) and hematopoietic stem cell (HSC) gene expression signatures were identified based on this functional stem cell characterization of each purified cell fraction and bioinformatic analyses showed that they are closely correlated. Both signatures predict poor overall survival independently of other prognostic factors in patients with cytogenetically normal AML, demonstrating that stem cell gene expression programs determine patient outcome. Overall, the results establish the clinical relevance of LSC defined solely on the basis of functional xenotransplant assays.
Peripheral blood samples were collected from patients with AML after obtaining informed consent according to the procedures approved by the Research Ethics Board of the University Health Network. Low-density mononuclear cells isolated from individuals with AML were frozen viably in FCS plus 10% vol/vol DMSO. Human cord blood cells obtained from full-term deliveries from consenting healthy donors according to the procedures approved by the Research Ethics Board of the University Health Network were processed as described33.
Cells were stained with antibodies to CD34, CD38, and in the case of cord blood CD36, and sorted on either a MoFlo (Beckman Coulter) or FACSAria (BD Biosciences) cells sorter. AML cells were sorted into CD34+/CD38−, CD34+/CD38+, CD34−/CD38+, CD34−/CD38− populations. Three independent pooled CB samples from 15-22 donors were used for isolation of HSC subsets and progenitors. Lin− Cord blood cells were sorted into CD34+/CD38− (HSC1), CD34+/CD38lo/CD36− (HSC2), and CD34+/CD38+ (Prog) populations. The mature cord blood fraction are cord blood cells after hemolysis (lin+). Representative sorting gates are in
Transplantation of Cells into NOD/scid Mice and Colony Formation Assays
NOD/ShiLtSz-scid (referred to as NOD/scid) mice were bred at the University Health Network/Princess Margaret Hospital. Animal experimentation followed protocols approved by the University Health Network/Princess Margaret Hospital Animal Care Committee. NOD/scid mice 8-13 weeks old were pretreated with 2.75-3.4Gy and antiCD122 antibody before being injected intrafemorally with transduced AML cells at a dose of 200 to 2.87×10̂6 sorted cells per mouse, as previously described23. Anti-CD122 antibody was purified from hybridoma cell line TM-b1 (generously provided by Prof T. Tanaka, Hyogo University of Health Sciences) and 200 ug injected i.p. following irradiation. Mice were sacrificed at 6.5 to 15 weeks (mean 10 weeks) and bone marrow from the injected right femur and opposite femur and, in some cases, both tibias as well as spleen, were collected for flow cytometry and secondary transplantation. Human engraftment was evaluated by flow cytometry of the injected right femur and non-injected bones and spleen. A threshold of 1% human CD45+ cells in bone marrow was used as positive for human engraftment. For each case, sort purity was integrated with the frequency of LSC in the other fractions in order to estimate LSC contamination and eliminate false positives (LSC+). Mice with greater than 50% CD19+ cells were labeled as normal human engraftment. The mean purity for each fraction was 98.3%. To eliminate false negative results (LSC−), the sensitivity of detection for each fraction was based upon the equivalent of unsorted cells injected (based upon the frequency of the sorted population). Each sorted fraction negative for LSC in vivo represented the equivalent of 6.58×10̂7 unsorted cells (mean). 5×10̂6 unsorted AML cells were confirmed to engraft mice for each sample. CD33 positivity was used to confirm the AML nature of the engraftment. Secondary transplantation was performed by intrafemoral injection of cells from either right femur or pooled bone marrow from primary mice into 1-3 secondary mice pretreated with irradiation and anti-CD122 antibody. For validation of cord blood HSC, 3×10̂3 to 1×10̂5 cells were injected intrafemorally per mouse and human engraftment determined by assessment of human CD45, CD19 and CD33 as previously described33. Human CFC assays were done as previously described33.
RNA from cord blood or AML cells was extracted using Trizol (Invitrogen) or RNeasy (Qiagen). RNA was amplified before array analysis by either Nugen (NuGEN Technologies) or in vitro transcription amplification for AML and cord blood, respectively. The in vitro transcription method is an optimized version of the T7 RNA polymerase based RNA amplification published by Baugh et al78. Human genome U133A and U133B arrays were used for cord blood and HT HG-U133A arrays for AML samples (Affymetrix). Data was normalized by RMA using either RMA Express ver. 1.0.4 or GeneSpring GX (Agilent). Clustering and heat maps were generated using MeV79, 80. LSC data was clustered using Pearson correlation metric with average linkage. HSC data was clustered using Pearson uncentered metric with average linkage. Gene Ontology (GO) annotation was performed using DAVID Bioinformatics Resources 6.781, 82.
The LSC-R expression profile was generated by a comparison of gene expression in LSC fractions with those fractions without LSC. The HSC-R expression signature was derived from an ANOVA analysis of probes more highly expressed in HSC1 than all other populations as well as probes more highly expressed in HSC1 and HSC2 than other populations. qRT-PCR confirmation of HSC microarray expression was performed using an ABI PRISM 7900 sequence detection system (Applied Biosystems) and GAPDH to normalize expression.
Gene set enrichment analysis was performed using GSEA v2.0 with probes ranked by signal-to-noise ratio and statistical significance determined by 1000 gene set permutations34, 35. Gene set permutation was used to enable direct comparisons between HSC and LSC results (<7 replicates and >7 replicates, respectively). Median of probes was used to collapse multiple probe sets/gene. For the GSEA analysis of the 110 AML cohort by the LSC-R signature, an LSC-R gene set generated by FDR cutoff of 0.1 was used in order to have >100 probes . . . .
Differentially expressed genes were mapped to known and interologous protein-protein interactions (PPIs) in I2D (Interolog Interaction Database) v1.72 (http://ophid.utoronto.ca/i2d)36, 37, with additional updated PPIs (February 2010) from BioGrid (http://www.thebiogrid.org)83, DIP (http://dip.doe-mbi.ucla.edu)84, HPRD (v8; http://www.hprd.org)85, IntAct (www.ebi.ac.uk/intact/86) and MINT (mint.bio.uniroma2.it/mint/)87. Experimental PPI networks were generated by querying I2D with the target genes/proteins to obtain their immediate interacting proteins, and their mutual interaction. Network visualization was performed using NAViGaTOR ver. 2.1.15 (http://ophid.utoronto.ca/navigator)37, 88.
Correlation with Clinical Outcome
All patients in the 160 AML cohort received intensive double-induction and consolidation therapy55, 89. 156 of these patients were enrolled in the AMLCG-1999 trial55, 89. Of the 163 samples, 3 were removed for being peripheral blood or MDS RAEB. Characterization and gene expression profiling of these cohorts is described in Metzeler et al. (GEO accession GSE12417)55. The log 2 expression values for each sample were centered to zero mean. The sum of log2 expression values of the HSC-R or LSC-R probe sets was used as the risk score for each patient. The 160 patients were split into high and low risk groups above and below the median risk score. These risk groups were assessed for prognostication of overall survival and event-free survival in univariate Cox analysis (logrank test) and in multivariate Cox analysis (Wald test). Similarly, the sum of log 2 expression of LSC-R or HSC-R FDR0.05 signature was used to rank the 110 AML cohort (subdivided by cytogenetic risk (GEO accession GSE6891 matrix1)), and chi-squared test applied to the top quartile of samples (highest expression sum). The “phenotypically determined stem cell signature” (
Frequency of LSC was determined with a limited dilution analysis and interpreted with the L-Calc software (StemSoft Software Inc). The lower estimate of frequency in cases without negative results was estimated using ELDA (WEHI—Bioinformatics Division)90. The HSC-R signature was generated using oneway ANOVA analysis using Tukey HSD post-hoc test and Benjamini-Hochberg multiple testing correction (FDR 0.05) (GeneSpring GX software Agilent). The LSC-R signature was generated using a Smyth's moderated t-test with Benjamini-Hochberg multiple testing correction to compare fractions positive for LSC against fractions without LSC91. Fisher's exact test was used to determine correlation between LSC-R or HSC-R and complete remission.
AML LSC have Heterogeneous Surface Marker Profiles and Frequency
As an initial step to investigate the molecular regulation of LSC, primary human AML patient samples were fractionated into LSC-enriched and LSC-depleted populations to enable further analysis. A xenotransplant model, including the pretreatment of NOD/scid mice with an anti-CD122 antibody (to deplete residual natural killer and macrophage cell activity) and intrafemoral injection of cells, was previously shown to increase the sensitivity of engraftment and detection of stem cells18, 23, 24. Thus, 16 primary human AML samples were sorted into 4 cell populations each based upon surface expression of CD34 and CD38, followed by functional validation in this optimized xenotransplant assay (
LSC were detectable in each of the four CD34/CD38 AML fractions as determined by human engraftment (≧1% human cells, 8+ weeks after injection) (
To gain insight into the molecular regulation of LSC, each of the functionally validated fractions derived from all 16 primary human AML samples were subjected to global gene expression analysis (
LSC and HSC both possess canonical stem cell functions such as self renewal and maturation processes that result in progeny that lack stem cell function1. However it is not known if human LSC utilize molecular mechanisms also employed by HSC or if they are governed through unique pathways. If gene expression programs are shared between LSC and HSC, there is a high likelihood that some will govern common stem cell functions, and such a comparison provides the first step in their identification To determine the gene expression profile of HSC, gene expression in human cord blood CD34+/CD38− (HSC1), CD34+/CD38lo/CD36− (HSC2), and CD34+/CD38+ (progenitor) cells as well as lineage positive (mature) cells were examined (
The LSC-R and HSC-R gene expression profiles were examined for common expression patterns. Gene Set Enrichment Analysis (GSEA), a threshold-free method of comparing gene expression between independent datasets, was used to compare the expression profiles and found enrichment of the HSC-R gene signature in the LSC-R profile (p<0.001) (
To identify the core pathways that these genes might predict, a stem cell protein-protein interaction network from the CE-HSC/LSC genes was generated, consisting of direct protein-protein interactions as well as proteins that link CE-HSC/LSC proteins using the I2D protein interaction database36, 37. The full network is available in NAViGaTOR 2.037 XML file format at http://www.cs.utoronto.ca/˜juris/data/NatMed10/. Further, a gene list as well as protein network representing more highly expressed genes common to normal lineage-committed progenitors was generated. The CE-HSC/LSC protein interaction network shows significant enrichment of multiple pathways separate from the progenitor network, including Notch and Jak-STAT signaling, which are implicated in stem cell regulation, thereby supporting the stem cell nature of the HSC and LSC-related gene profiles38-44. To gain further insight into the gene expression programs preferentially active in LSC, this data was compared with previously generated human and murine gene sets derived from stem, progenitor and mature cell populations as well as embryonic stem cells (ESC)25, 28, 45-51. In a comparison of gene expression between LSC and non-LSC fractions by GSEA, LSC-R gene expression positively correlated with pre-existing primitive cell gene sets such as HSC genes and genes shared between HSC and lineage-committed progenitor cells, and negatively correlated with gene sets derived from more differentiated cells such as late lineage-committed progenitor and mature blood cells (FDR q≦0.05; see Example 9 for further description)25, 28, 45. As well, the normal common lineage-committed progenitor-related gene list negatively correlated with genes more highly expressed in LSC fractions than with non-LSC (p<0.001) (
To investigate whether there is a correlation between these LSC-R and HSC-R gene signatures and clinical outcomes in AML patients, a pre-existing set of AML gene expression profiles were interrogated53-55. As discussed later, this approach assumes that, since a hallmark of AML is altered growth and blocked differentiation, some components of stem cell gene expression programs will persist in leukemic blasts. In their study, Valk et al. examined global gene expression in leukemic blasts from 285 AML patients and identified 16 distinct groups by unsupervised cluster analysis53. In general, clustering was driven by the presence of gross chromosomal alterations and known point mutations. When the genes that define each cluster were examined in the LSC-R and HSC-R profiles, a significant enrichment for a number of clusters was found. Generally, the LSC-R and HSC-R profiles produced similar results in the enrichment of the clusters and correlated positively with clusters characterized by FLT3-ITD or EVI1 over-expression, molecular markers that indicate a poor prognosis53, 56-58. They correlated negatively with clusters that have good prognosis, including karyotypes such as t(15;17) and inv(16) although 11q23 MLL was also in this group53. Recently, 110 of these AML samples were stratified into ‘poor’ or ‘good’ prognostic risk groups, based upon cytogenetic alterations, and new gene expression data was generated54. Higher expression of the LSC-R or HSC-R signatures was able to predict poor prognostic risk patients in this data set (p=0.0125 and p=0.001 respectively). Further, enrichment analysis identified subsets of LSC-R and HSC-R genes that correlate with poor cytogenetic risk groups (
To validate the clinical relevance of stem cell gene expression in leukemia, a second cohort of 160 cytogenetically normal (CN) AML patients were examined for whom gene expression and outcome data was available55. CN AML represents approximately 45% of all AML subtypes and is an intermediate risk category57, 58. The LSC-R or HSC-R gene signature was used to divide these patients into 2 equal groups based upon the median expression of the respective signature in bulk AML bone marrow cells. There was significant negative correlation between the rate of complete remission and high expression of the LSC-R signature (p=0.0054, n=158), while negative correlation with the HSC-R signature approached significance (p=0.073, n=158). Both signatures negatively significantly correlated with overall survival (LSC p=5.2×10̂−6, HR=2.4 (95% Cl 1.6-3.6); HSC p=1.8×10̂−5, HR=2.3 (95% Cl 1.6-3.4)) (
CN AML patients lack gross genomic changes making it difficult to identify a prognostic biomarker. However, there has been much effort to use mutational status of specific genes to determine prognosis57-61. Recently, FLT3ITD status and NPM1 mutational status have been combined to designate low molecular risk (NPM1mut FLT3ITD−) (LMR) and high molecular risk (FLT3ITD+ or NPM1wt FLT3ITD−) (HMR) groups57, 60, 61. Patients with LMR AML, who generally account for approximately 35% of CN-AML, have favorable prognosis and are offered standard treatment, however there is still heterogeneity in outcome57, 60, 61. Multivariate analysis was used to demonstrate that the LSC-R and HSC-R signatures could predict outcome independently of known molecular prognostic factors such as molecular risk status and CEBPA (
To determine the robustness of the clinical correlation, the prognostic value of the LSC-R signature was examined in an additive analysis (
This data provides human HSC and LSC-specific gene expression signatures derived from multiple sorted cell fractions where both HSC and LSC content was contemporaneously assayed by in vivo repopulation. LSC and HSC share a core transcriptional program that, when taken together, reveals components of the molecular machinery that govern stemness. Since both signatures show strong prognostic significance predicting AML patient outcome, the data establishes that determinants of stemness influence clinical outcome. These findings have two important implications on the role of stem cells in cancer. First, the firm linkage between LSC and HSC signatures and the ability of these signatures to predict survival, a seminal cancer property, provide strong evidence that LSC defined on the basis of functional stem cell properties are distinct and clinically relevant cells present in the leukemic clone. Although the validity of the CSC model continues to be contested for many tumour types, this data supports the contention that LSC are discrete cell types and not artifacts of experimental xenograft models or clinically unimportant17, 20, 63-66. Second, the approach that has been taken in AML provides a paradigm for assessing both the identity and clinical relevance of LSC and CSC from other leukemias and solid tumours, respectively. A well validated and sensitive xenograft assay is essential since only functionally validated populations showed clinical relevance, while signatures derived from phenotypically defined populations did not. Furthermore, the finding of LSC clinical relevance predicts that therapies targeting LSC should improve survival outcomes and that xenograft models based on primary AML engraftment should be used for preclinical evaluation of new cancer drugs.
The identification of shared transcriptional profiles in LSC and HSC strongly predicts that these components of the molecular machinery must play a role in the establishment and maintenance of the stem cell state. Indeed biological studies have clearly established that LSC and HSC share a number of properties including quiescence, niche dependence, and self renewal1. Although this study was not designed to determine the mechanism whereby these genes govern the stem cell properties, it can be inferred that many must have an important role. Genes such as EVI-1, MEIS1, HOXB3, and ERG as well as the pathways identified from network analysis are well known as critical regulators of normal murine and/or human HSC function67-70. Moreover, many genes such as EVI-1, ERG, FLT3 and BAALC are also associated with poor prognosis in AML58, 71. As each is present in the shared stem cell gene profile, it is speculated that their value as a highly significant prognostic indicator derives from their role in governing stem cell function. Collectively, the identification of so many (eighteen) known stem cell and leukemia genes within the transcriptional profile provides confidence that many of the remaining genes not previously associated with the stem cell state are indeed functionally relevant in human LSC and HSC. The shared stem cell profile also adds to the discussion and controversy regarding the cell of origin for AML and whether LSC derive from the transformation of HSC or committed progenitors1, 16, 72-75. GSEA showed that LSC were only enriched for HSC programs and not from progenitor or embryonic cell programs, pointing to their close relationship.
The prognostic value that was found in the LSC and HSC signatures is of significant clinical importance in a disease like AML where a large proportion of patients are cytogenetically normal. Gross genomic changes (e.g. chromosomal translocations) cannot be used to guide therapy, but the mutational status of a small number of genes is now widely employed to stratify LMR patients toward less aggressive treatment compared to HMR patients57, 60, 61. It is particularly noteworthy that the LSC signature clearly identified a large subset (45%) of patients in the LMR group that had poor long term survival. Such patients might benefit from more aggressive therapy. It is somewhat counterintuitive that an LSC/HSC signature should be present in the leukemia blasts (i.e. non-LSC) of a patient with poor outcome. It is possible that the higher expression of a signature simply reflects a higher proportional content of LSC, as suggested previously12, and such cells are harder to eradicate making patient survival shorter. However even in the peripheral blood of AML patients with the highest frequency of LSC only 1 in 500 to 1000 cells is an LSC making it highly unlikely their gene expression was detected. Alternatively, it is well known that as normal HSC maturation occurs there is an essential substitution of stem cell functions (including self renewal, quiescence, DNA damage response, apoptosis) by differentiation programs. In AML, differentiation is perturbed and abnormal but also highly variable between genetic and morphological subtypes76. Additionally, human and murine studies have clearly shown that the self renewal capacity of LSC is abnormal resulting in massive LSC expansion compared to normal HSC1, 64, 77. It is speculated that there is similar variation in the uncoupling of stem cell functions and maturation programs. This data argues that when this dissociation is poor the stem cell programs will persist in bulk leukemia blasts, while in other samples there is a more rigid demarcation between the LSC and non-LSC similar to normal HSC development. The reason the blasts in the former example lack actual LSC function is that any individual blast will only possess a limited repertoire of the full program but since RNA is collected from a large cell dose the full program will be uncovered. If this explanation is correct the greater retention of residual stem cell properties in all cells of the leukemic clone is reflective of an LSC whose stem cell properties are more deregulated resulting in disease progression, treatment failure and shortened survival. More broadly, this data points to the importance of developing LSC biomarkers to contribute to personalized cancer therapy and the need to identify therapeutic targets that will target all leukemic cells in the clone including the LSC.
The relationship of the LSC-R and HSC-R gene profiles to previously elucidated human LSC-associated gene expression data was examined. Four previous studies assessed LSC global gene expression. These involved either a comparison of LSC to HSC (AML vs normal, CD34+/CD38− cells)55, 56 or LSC to more differentiated AML cells in small patient cohorts (AML CD34+/CD38− vs CD34+/CD38+ cells)57, 58. In one latter case, the LSC nature of each fraction was not functionally validated58 and, as shown here and as others have shown, the use of CD34 and CD38 to identify stem cell fractions without concomitant functional analysis can mislabel the stem cell nature of sorted cell fractions.
First, of the studies that compared LSC-enriched populations to non-stem cell enriched AML cells, no correlation with the LSC list generated by Gal et al based upon phenotypically defined populations (AML CD34+/CD38− vs CD34+/CD38+ cells)58 was found.
The LSC-R and HSC-R gene expression data here was then compared with the gene sets identified in the two studies that contrasted the gene expression of LSC-enriched populations (AML CD34+/CD38− cells) with HSC-enriched populations (normal CD34+/CD38− cells)55, 56. While a comparison of gene expression of LSC against HSC may identify genes deregulated in LSC, it does not take into account the expression of leukemia associated genes that are independent of the stem cell nature of the populations. When applied to the LSC-R and HSC-R data, the results are the same: in both cases, the genes more highly expressed in LSC vs HSC were negatively correlated with the LSC-R and HSC-R stem cell related expression data while the genes with lower expression in LSC vs HSC were positively correlated with the LSC-R and HSC-R stem cell related expression (
Overall, these analyses establish the necessity in CSC gene expression studies to functionally validate each stem cell population in a sensitive xenograft model. Further, they highlight the requirement to compare CSC populations against non-CSC cancer populations, as opposed to CSC vs normal populations, when the goal of the study is to provide insight into the entire stem cell-related gene expression program present in CSC.
The HSC-R genes enriched in GSEA analysis of the LSC expression profile (CE-HSC/LSC) represent a group of stem cell related genes that are active in both stem cell populations compared to their respective non-stem cell fractions (
ABCB1 (ATP-binding cassette, sub-family B (MDR/TAP), member 1; MDR1) acts as a drug transport pump and imparts a multidrug resistant phenotype to cancer cells1, 2. Further, the high expression of ABCB1 in stem cells provides a mechanism for the high efflux of dyes, which can be used to isolate a ‘side population’ of cells that are enriched for stem cells3, 4. Additionally, ABCB1 expression negatively correlates with treatment response in leukemia5.
ALCAM (activated leukocyte cell adhesion molecule; CD166) is a cell surface molecule identified as a marker for the enrichment of colon cancer stem cells6. ALCAM has been implicated in cancer; for example, increased expression of ALCAM is a prognostic marker for poor outcome in pancreatic cancel7, 8.
BAALC (Brain and acute leukemia gene, cytoplasmic) was identified in an attempt to isolate genes differentially expressed in AML+8 compared to cytogenetically normal AML9. High expression of BAALC correlates with poor outcome in leukemia10, 11. BAALC is preferentially expressed in CD34+ primitive cells and expression is down-regulated upon cell differentiation12.
BCL11A (B-cell CLL/lymphoma 11A (zinc finger protein)) is implicated in leukemogenesis as a target of chromosomal translocations of the immunoglobulin heavy chain locus in B-cell non-Hodgkin lymphomas13.
DAPK1 (Death-associated protein kinase 1) is a serine/threonine kinase gene involved in regulating apoptosis14. Decreased expression of DAPK1 has been implicated in both inherited and sporadic chronic lymphocytic leukemia15.
ERG (Ets-related gene), a transcription factor required for normal adult HSC function, is rearranged in human myeloid leukemia and Ewing's sarcoma16-18. Additionally, over-expression of ERG is observed in leukemia and associated with poor patient outcome in AML with normal karyotype10, 19, 20.
EVI1 (Ecotropic viral integration site 1) is a nuclear transcription factor implicated in regulation of adult HSC proliferation and maintenance21. Excision of EVI1 in mice results in a decrease of HSC frequency while over-expression results in greater self-renewal. Additionally, EVI1 plays a role in leukemogenesis22. It is a target of translocation events in human leukemia, for example, generating the fusion protein RUNX-EVI1 as a result of t(3;21)(q26;q22). High expression of EVI1 is associated with poor patient outcome22, 23.
FLT3 (Fms-like tyrosine kinase 3; Stem cell tyrosine kinase 1, STK1; Flk-2) is a receptor tyrosine kinase expressed in primitive hematopoietic cells that has been implicated in the regulation of HSC16, 24-26. Mutation of FLT3 is a strong prognostic indicator in CN-AML associated with poor outcome27-29.
HLA-DRB4 (major histocompatibility complex, class II, DR beta 4) has been linked to increased frequency of leukemia. For example, it is a marker for increased susceptibility for childhood ALL in males30.
HLF (Hepatic leukemia factor), a leucine zipper gene, is involved in gene fusions in human leukemia as well as acting as a positive regulator of human HSC31, 32.
HOXA5 (homeobox A5), along with HOXB2, HOXB3 and MEIS1 is a homeobox gene and is hypermethylated in leukemia33. The hypermethylation of HOXA5 is correlated with progression of CML to blast crisis34.
HOXB2 (homeobox B2) is a member of the HOX gene family. Increased HOXB2 expression is associated with NPM1 mutant CN AML, supporting a correlation between altered HOX expression and NPM1 mutation35.
HOXB3 (homeobox B3) is expressed in a putative HSC cell population of CD34+ cells36 and has been shown to regulate the proliferative capacity of murine HSC when mutated along with HOXB437. Furthermore, HOXB3 can induce AML in mice when expressed along with MEIS138.
INPP4B (inositol polyphosphate-4-phosphatase, type II, 105 kDa) has been implicated as a tumour suppressor gene, supported by the observation of common loss of heterozygosity of the INPP4B locus correlating with lower overall patient survival39.
MEIS1 (Myeloid ecotropic viral integration site 1 homolog, Meis homeobox 1) is a homeobox gene that is highly expressed in MLL rearranged leukemias40, 41. It has been shown to transform hematopoietic cells when co-expressed with genes such as HOXB3, HOXA9 and NUP98-HOXD13 and acts to regulate LSC frequency in a murine MLL leukemia model38, 42-44. Further, it has recently been shown to regulate HSC metabolism through Hif-1alpha45.
MYST3 (MYST histone acetyltransferase (monocytic leukemia) 3; MOZ) is a target of the t(8;16)(p11;p13) translocation commonly observed in M4/M5 AML46. It is a transcriptional activator and has histone acetyl-transferase activity46. As well, homozygous knockout of Myst3 resulted in HSC defects, indicating that it is the required for HSC function47.
SPTBN1 (spectrin, beta, non-erythrocytic 1) is a cytoskeletal protein identified as a fusion partner of FLT3 in atypical chronic myeloid leukemia48.
YES1 (v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1) is a member of the SRC family of kinases and, like SRC, is ubiquitously expressed. YES1 expression was shown to be enriched in murine HSC, ESC and NSC49. YES1 is implicated in maintaining mouse embryonic stem cells in an undifferentiated state50. Furthermore, YES1 was found to be amplified in gastric cancer51.
Prior studies have generated normal human and murine hematopoietic gene signatures for populations enriched for stem, progenitor and mature cells. The overlap between the stem cell expression profiles shown here with 3 pre-existing stem cell expression sets available in the Molecular Signatures Database (MSigDB)52-54 using GSEA were examined. First, a human stem cell gene set, developed by Georgantas et al 2004, compared only CD34+ cells split into 2 populations consisting of stem cell enriched (CD34+/CD38− cells from bone marrow, cord blood and mobilized peripheral blood) and a progenitor enriched fraction (CD34+4/[CD38/Lin]+)52. This gene set (“HEMATOP_STEM_ALL_UP”) was enriched in both of the HSC-R and LSC-R expression profiles (FDR q<0.05), supporting the stem cell nature of the expression signatures described herein.
Next, a murine gene set representing genes more highly expressed in an HSC population than in a multipotent progenitor (MPP) population (Rhlo/Sca-1+/c-kit+/lin−/lo vs Rhhi/Sca-1+/c-kit+/lin−/lo) were examined53. The MPP in this case represents a progenitor population that can generate both lymphoid and myeloid cells but not reconstitute beyond 4 weeks. This HSC vs MPP list (“PARK_HSC_VS_MPP_UP”) was enriched for in our LSC-R and HSC-R expression profiles (FDR q=0.03 and 0.04, respectively). This further supports the normal hematopoietic gene expression data and indicates that AML LSC preferentially express an HSC program, not an MPP program, compared to non-LSC stem cell populations.
Finally, the 24 murine gene sets generated by Ivanova et al. 2002 available in MSigDB were examined54. These were generated by examining gene expression in murine stem cell, lineage committed progenitor and mature blood cells from both adult bone marrow and fetal liver and comparing multiple combinations of populations. In the case of adult bone marrow, both long-term and short-term HSCs were isolated (LT HSC and ST HSC, respectively). In general, the LSC-R and HSC-R profiles were enriched for gene sets from primitive cell populations and were negatively correlated with those derived from differentiated populations (“late progenitor” list and “mature” cell list). As expected, the HSC-R expression data correlated with the combined LT and ST HSC gene list (“HSC” FDR q=0.01) and weakly with the LT HSC list alone (FDR q=0.09). However, the HSC-R did not significantly correlate with the ST HSC gene set (FDR q=0.44). Since a ST HSC has not yet been isolated in the human system, this suggests two possible explanations, among others: that the ST HSC does not exist in humans or that the ST HSC gene expression program is unique and undetectable in our sorted population that contains all forms of human HSC. Examining the human LSC-R profile, there is enrichment of the genes in common to primitive cells (“HSC and progenitors”), a weak correlation with the murine LT HSC set (FDR q=0.14) but no correlation with the shared LT and ST stem cell (“HSC”) set (FDR q=0.45). This implies that LSC may preferentially express the gene programs expressed in murine primitive cells as well as, potentially, a subset of the programs specific for LT HSC, although these analyses may suffer from interspecies differences.
Overall, these analyses support the conclusion that HSC-related gene programs and not progenitor or mature gene programs are expressed in AML LSC compared to leukemic blast cells.
The FLT3ITD mutation is a strong prognostic indicator of poor outcome in cytogenetically normal AML27-29. Multivariate analysis demonstrated that the LSC-R and HSC-R signatures could predict outcome independently of known molecular prognostic factors such as FLT3ITD status, NPM1 mutation and CEBPA (
The expression values and clinical outcome data for the a group of normal AML such as the 160 cytogenetically normal AML samples used in the primary study will be used as a test group in an analysis to determine the optimal threshold of expression for the stratification of new patients into poor or good prognostic groups in the clinic.
Individuals who present or are suspected of having a hematological cancer will provide a blood sample. The white blood cell fraction will be tested for the expression of two or more genes listed in Tables 2, 4, 6, 12 and/or 14 or for example two or more CE-HSC/LSC genes such as those listed in tables 13 and 19. The expression values will be scaled (e.g. normalized) to a standard (e.g. using experimental controls) and then compared to a threshold value to determine poor or good prognosis prediction.
A prognostic analysis as conducted as was done in
While the present disclosure has been described with reference to what are presently considered to be the preferred examples, it is to be understood that the disclosure is not limited to the disclosed examples. To the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
All publications, patents and patent applications are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety.
This is a Patent Cooperation Treaty Application which claims the benefit of 35 U.S.C. 119 based on the priority of corresponding U.S. Provisional Patent Application No. 61/266,704 filed Dec. 4, 2009, which is incorporated herein in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CA10/02048 | 12/3/2010 | WO | 00 | 6/1/2012 |
Number | Date | Country | |
---|---|---|---|
61266704 | Dec 2009 | US |