1. Field of the Invention
The present invention generally relates to the field of cancer research. More specifically, the present invention relates to the integration of information of somatic cell DNA copy number abnormalities and gene expression profiling to identify genomic signatures specific for high-risk multiple myeloma useful for predicting clinical outcome and survival.
2. Description of the Related Art
Genomic instability is a hallmark of cancer. With the recent advances in comparative genomic hybridization (CGH) (Pinkel and Albertson, 2005a), a deeper understanding of the relationship between somatic cell DNA copy number abnormalities (CNAs) in disease biology has emerged (Pinkel and Albertson, 2005b; Feuk et al, 2006; Sharp et al, 2006; Lupski et al, 2005). Remarkably, DNA copy number abnormalities have recently been discovered in germline DNA within the human population, suggesting that inheritance of such copy number abnormalities may predispose to disease (Sebat et al, 2004; Redon et al, 2006; Tuzun et al, 2005; Iafrate et al, 2004).
Multiple myeloma (MM) is a neoplasm of terminally differentiated B-cells (plasma cells) that home to and expand in the bone marrow causing a constellation of disease manifestations including osteolytic bone destruction, hyercalcemia, immunosuppression, anemia, and end organ damage (Barlogie et al, 2005). Multiple myeloma is the second most frequently occurring hematological cancer in the United States after non-Hodgkin's lymphoma (Barlogie et al, 2005), with an estimated 19,000 new cases diagnosed in 2007, and approximately 50,000 patients currently living with the disease. Despite significant improvement in patient outcome as a result of the optimal integration of new drugs and therapeutic strategies in the clinical management of the disease, many patients with multiple myeloma relapse and succumb to the disease (Kumar and Anderson, 2005). Importantly, a subset of high-risk disease, defined by gene expression profiles, does not benefit from current therapeutic interventions (Shaughnessy et al, 2007; Zhan et al, 2008). A complete definition of high-risk disease will provide a better means of patient stratification and clinical trial design and also provide the framework for novel therapeutic design.
Unlike in most hematological malignancies, the multiple myeloma genome is often characterized by complex chromosomal abnormalities including structural and numerical rearrangements that are reminiscent of epithelial tumors (Kuehl and Bergsagel, 2002). Errors in normal recombination mechanisms active in B-cells to create a functional immunoglobulin gene result in chromosomal translocations between the immunoglobulin loci and oncogenes on other chromosomes. These rearrangements, likely represent initiating oncogenic events, which lead to constitutive expression of resident oncogenes that come under the influence of powerful immunoglobulin enhancer elements. In multiple myeloma, recurrent translocations involving the CCND1, CCND3, MAF, MAFB and FGFR3/MMSET genes account for approximately 40% of tumors (Kuehl and Bergsagel, 2002), and also define molecular subtypes of disease (Zhan et al, 2006). Hyperdiploidy, typically associated with gains of chromosomes 3, 5, 7, 9, 11, 15, and 19, arising through unknown mechanisms, defines another 60% of multiple myeloma disease. Additional copy number alterations, including loss of chromosomes 1p and 13, and gains of 1q21, are also characteristic of multiple myeloma plasma cells, and are important factors affecting disease pathogenesis and prognosis (Fonseca et al, 2004; Liebisch and Dohner, 2006). Gains of the long arm of chromosome 1 (1q) are one of the most common genetic abnormalities in myeloma (Avet-Loiseau et al, 1997). Tandem duplications and jumping segmental duplications of the chromosome 1q band, resulting from decondensation of pericentromeric heterochromatin, are frequently associated with disease progression (Sawyer et al, 1998; Le Baccon et al, 2001; Sawyer et al, 2005). Using array comparative genomic hybridization on DNA isolated from plasma cells derived from patients with smoldering myeloma, Rosinol and colleagues showed that the risk of conversion to overt disease was linked to gains of 1q21 and loss of chromosome 13 (Rosinol et, 2005). These findings were confirmed by using interphase fluorescence in situ hybridization (FISH) analysis. Additionally, it was demonstrated that gains of 1q21 acquired in symptomatic myeloma were linked to inferior survival and were further amplified at disease relapse (Hanamura et al, 2006). The recognition that many of these abnormalities can be observed in the benign plasma cell dyscrasia, monoclonal gammopathy of undetermined significance (MGUS), suggests that additional genomic changes are required for the development of overt symptomatic disease requiring therapy.
It is speculated that copy number abnormalities might represent important events in disease progression. Ploidy changes in multiple myeloma have been primarily observed through either low resolution approaches, such as metaphase G-banding karyotyping, which might miss submicroscopic changes and is unable to accurately define DNA breakpoints, or locus specific studies such as interphase or metaphase fluorescence in situ hybridization (FISH), which focuses on a few pre-defined, small, specific regions on chromosomes. Array-based comparative genomic hybridization is a recently developed technique that provides the potential to simultaneously investigate with high-resolution copy number abnormalities across the whole genome (Barrett et al, 2004; Pollack et al, 1999; Pinkel et al, 1998). With the power of this emerging technique, researchers have confirmed known abnormalities and also found novel genomic aberrations in a variety of cancers. Among those novel aberrations discovered, some are benign while the others are related to disease initiation or progression. These two groups of lesions, so called ‘drivers’ and ‘passengers’, need to be differentiated before being used to search for mechanisms underlying disease pathobiology and/or in clinical diagnosis and prognosis (Lee et al, 2007).
The direct effect of DNA copy number on cellular phenotype is to interfere with gene expression by either altering gene dosage, disrupting gene sequences, or perturbing cis-elements in promoter or enhancer regions (Feuk et al, 2006; Phillips et al, 2001; Platzer et al, 2002; Pollack et al, 2002; Hyman et al, 2002; Orsetti et al, 2004; Stallings, 2007; Auer et al, 2007; Gao et al, 2007). Copy number abnormalities have been shown to contribute to ˜17% of gene expression variation in normal human population and has little overlap with the contribution of single nucleotide polymorphisms (SNPs) (Stranger et al, 2007). Additionally, more than half of highly amplified genes were demonstrated to exhibit moderately or highly elevated gene expression in breast cancer (Pollack et al, 2002). Thus, considering the high number of copy number abnormalities in multiple myeloma cells, it is likely that copy number abnormalities play a pivotal role in disease initiation and progression.
Cigudosa et al (1998), Gutiérrez et al (2004), and Avet-Loiseau et al (1997) first applied traditional comparative genomic hybridization approaches (Houldsworth and Chaganti, 1994), and expanded our knowledge about the nature of chromosome instability in multiple myeloma. Walker et al (2006) applied single nucleotide polymorphism (SNP)-based mapping array to investigate DNA copy number and loss of heterozygosity (LOH) in this disease. We previously used interphase fluorescence in situ hybridization analysis on more than 400 cases of newly diagnosed disease to show gains of 1q, while not seen in monoclonal gammopathy of undetermined significance, when present in smoldering multiple myeloma, was associated with increased risk of progression to overt multiple myeloma, and when present in newly diagnosed symptomatic disease was associated with a poor outcome following autologous stem cell transplantation (Hanamura et al, 2006). Importantly, longitudinal studies on this cohort revealed that a percentage of cells with 1q gains could increase overtime within a given patient, suggesting this event was related to disease progression and clonal evolution. Using array comparative genomic hybridization on a small cohort of 67 cases we used non-negative matrix factorization techniques to identify two subtypes of hyperdiploid disease, one with evidence of 1q gains, and that this form of hyperdiploid disease was associated with shorter event-free survival (Carrasco et al, 2006). Consistent with these data, we recently reported on the use of gene expression profiling to identify a gene expression signature of high-risk disease dominated by elevated expression of genes mapping to chromosome 1q and reduced expression of genes mapping to 1p (Shaughnessy et al, 2007).
We also investigated potential mechanisms of genome instability in multiple myeloma cells. The results of the study revealed that copy number alterations in chromosome 1q and 1p were highly correlated with gene expression changes and these changes also strongly correlated with risk of death from disease progression, a gene expression based proliferation index and a recently described gene expression-based high-risk index. Importantly, we also found that copy number gains and increased expression of AGO2, a gene mapping to 8q24 and coding for a protein exclusively functioning as a master regulator of microRNA expression and maturation, was also significantly correlated with outcome.
Thus, the prior art is deficient in copy number abnormalities and expression profiling of genes to identify distinct and prognostically relevant genomic signatures linked to survival for multiple myeloma that contribute to disease progression and can be used to identify high-risk disease and guide therapeutic intervention. The prior art is also deficient in identification of DNA deletions or additions on chromosomes 1 and 8, which are correlated with gene expression patterns that can be used to identify patients experiencing a relapse after being subjected to therapy. The present invention fulfills this long-standing need and desire in the art.
The present invention is directed to a method of detecting copy number abnormalities and gene expression profiling to identify genomic signatures linked to survival for a disease. Such a method comprises isolating plasma cells from individuals who suffer from a disease and from individuals who do not suffer from the same disease and nucleic acid is extracted from their plasma cells. The nucleic acid is hybridized to a comparative genomic DNA array and to a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of an altered expression of disease candidate genes are indicative of the specific genomic signatures linked to survival for a disease.
The present invention is directed to a method of detecting a high-risk index and increased risk of death from the disease progression of multiple myeloma. Such a method comprises isolating plasma cells from individuals who suffer from the disease and from individuals who do not suffer from multiple myeloma and nucleic acid is extracted from their plasma cells. The nucleic acid is hybridized to a comparative genomic DNA array and to a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of an altered expression of disease candidate genes and copy number abnormalities are indicative of a high-risk index and increased risk of death from the disease progression of multiple myeloma.
The present invention is also directed to a method of detecting copy number abnormalities and gene expression alterations at chromosomal location 8q24 and increased expression of the gene Argonaute 2 (AG02). Such a method comprises isolating plasma cells from individuals who suffer from multiple myeloma and from individuals who do not suffer from multiple myeloma and nucleic acid is extracted from their plasma cells. The nucleic acid is hybridized to a comparative genomic DNA array and to a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of an altered expression of the gene Argonaute 2 and copy number abnormalities involving gains at 8q24 are linked to a high-risk index and increased risk of death from multiple myeloma.
The present invention is directed to a method of detecting high risk in disease progression of multiple myeloma. Such a method comprises isolating plasma cells from individuals who suffer from the disease and from individuals who do not suffer from multiple myeloma and nucleic acid is extracted from their plasma cells. The nucleic acid is hybridized to a comparative genomic DNA array and to a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of an altered expression of disease candidate genes and copy number abnormalities involving loss of chromosome 1p DNA, loss of 1p gene expression, or loss of 1p protein expression are indicative of high risk in disease progression of multiple myeloma.
The present invention is directed to a method of detecting high risk in disease progression of multiple myeloma. Such a method comprises isolating plasma cells from individuals who suffer from the disease and from individuals who do not suffer from multiple myeloma and nucleic acid is extracted from their plasma cells. The nucleic acid is hybridized to a comparative genomic DNA array and to a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of an altered expression of disease candidate genes and copy number abnormalities involving gain of chromosome 1q DNA, gain of 1q gene expression, or gain of 1q protein expression are indicative of high risk in disease progression of multiple myeloma.
The present invention is directed to a method of detecting diagnostic, predictive, or therapeutic markers of a disease. Such a method comprises isolating plasma cells from individuals who suffer from a disease and from individuals who do not suffer from the same disease and nucleic acid is extracted from their plasma cells. The nucleic acid of the plasma cells is hybridized to a comparative genomic DNA array and to a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of an altered expression of disease candidate genes and copy number abnormalities involving loss of chromosome 1p DNA, loss of 1p gene expression, loss of 1p protein expression, gain of chromosome 1q DNA, gain of 1q gene expression, gain of 1q protein expression, gain of chromosome 8 DNA, gain of 8q gene expression, or gain of 8q protein expression are indicative of detection of diagnostic, predictive, or therapeutic markers of a disease.
The present invention is also directed to a method of detecting copy number abnormalities and gene expression alterations to identify genomic signatures linked to survival for a disease. Such a method comprises isolating plasma cells from individuals who suffer from a disease and from individuals who do not suffer from a disease and nucleic acid is extracted from their plasma cells. The nucleic acid is analyzed to determine copy number abnormalities, expression levels of genes, and chromosomal regions in the plasma cells. The data is analyzed using bioinformatics and computational methodology and the results of copy number abnormalities and gene expression alterations identify genomic signatures linked to survival for a disease.
The present invention is also directed to a kit for the identification of genomic signatures linked to survival specific for a disease. Such a kit comprises an array comparative genomic hybridization DNA microarray and a gene expression DNA microarray to determine copy number abnormalities and expression levels of genes in the plasma cells, and written instructions for extracting nucleic acid from the plasma cells of an individual and hybridizing the nucleic acid to the DNA microarray.
The present invention contemplates developing and validating a quantitative RT-PCR-based assay that combines these staging/risk-associated genes with molecular subtype/etiology-linked genes identified in the unsupervised molecular classification. Assessment of the expression levels of these genes may provide a simple and powerful molecular-based prognostic test that would eliminate the need for testing so many of the standard variables currently used with limited prognostic implications that are also devoid of drug-able targets. Use of a PCR-based methodology would not only dramatically reduce time and effort expended in fluorescence in-situ hybridization-based analyses but also markedly reduce the quantity of tissue required for analysis. If these gene signatures are unique to myeloma tumor cells, such a test may be useful after treatment to assess minimal residual disease, possibly using peripheral blood as a sample source.
Important implications follow from these observations. First, as varied gene expression patterns often represent distinct underlying biological states of normal (Shaffer et al, 2001) and transformed tissues (Shaffer et al, 2001; Ferrando et al, 2002; Ross et al, 2004), it seems likely that the high-risk signature is related to a biological phenotype of drug resistance and/or rapid relapse in multiple myeloma. Accordingly, this myeloma phenotype deserves further study in order to better characterize the most relevant pathways and identify therapeutic opportunities. The relatively large gene expression datasets employed here provide one avenue to more fully define these tumor types. Second, while some hurdles remain in routine clinical implementation of high-risk stratification, this work highlights that a specific subset of myeloma patients continues to receive minimal benefit from current therapies. A practical method to identify such patients should notably improve patient care. For patients predicted to have a favorable outcome, efforts to minimize toxicity of standard therapy might be indicated, while those predicted to have poor outcome, regardless of the current therapy utilized may be considered for early administration of experimental regimens. The present invention contemplates determining if this tumor gene expression profiling (GEP) and array comparative genomic hybridization model of high-risk could be implemented clinically and if it would be relevant for other front-line regimens, including those that test novel combinations of proteasome inhibitors and/or IMIDs with standard anti-myeloma agents and high dose therapy.
In one embodiment of the present invention, there is provided a method of high-resolution genome-wide comparative genomic hybridization and gene expression profiling to identify genomic signatures linked to survival specific for a disease, comprising: isolating plasma cells from individuals suspected of having multiple myeloma and from individuals not suspected of having multiple myeloma within a population, sorting said plasma cells for CD138-positive population, extracting nucleic acid from said sorted plasma cells, hybridizing the nucleic acid to DNA microarrays for comparative genomic hybridization to determine copy number abnormalities, and hybridizing said nucleic acid to a DNA microarray to determine expression levels of genes in the plasma cells, and applying bioinformatics and computational methodologies to the data generated by said hybridizations, wherein the data results in identification of specific genomic signatures that are linked to survival for said disease.
Such a method may further comprise performing data analysis, within-array normalization, between-array normalization, segmentation, identification of atom regions, multivariate survival analysis, correlation analysis of gene expression level and DNA copy number, sequence analysis, and gene ontology (GO) analysis.
Additionally, the genes may map to chromosomes 1, 2, 3, 5, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, and 22, and may map to the p or q regions of these chromosomes. Examples of such genes may include, but are not limited to, those that are selected from the group consisting of AGL, AHCTF1, ALG14, ANKRD12, ANKRD15, APH1A, ARHGAP30, ARHGEF2, ARNT, ARPC5, ASAH1, ASPM, ATP8B2, B4GALT3, BCAS2, BLCAP, BOP1, C13orf1, C1orf107, C1orf112, C1orf19, C1orf2, C1orf21, C1orf56, C20orf43, C20orf67, C8orf30A, C8orf40, CACYBP, CAPN2, CCT3, CD48, CD55, CDC42BPA, CDC42SE1, CENPF, CENPL, CEP170, CEPT1, CHD1L, CKS1B, CLCC1, CLK2, CNOT7, COG3, COG6, CREB3L4, CSPP1, CTSK, CYC1, DAP3, DARS2, DBNDD2, DDR2, DEDD, DENND2D, DHRS12, DIS3, DNAJC15, EDEM3, EIF2C2, ELAVL1, ELF1, ELK4, ELL2, ENSA, ENY2, EXOSC4, EYA1, FAF1, FAIM3, FAM20B, FAM49B, FBXL6, FDPS, FLAD1, FLJ10769, FNDC3A, FOXO1, GLRX, GNAI3, GON4L, GPATCH4, GPR89B, HBXIP, IARS2, IL6R, ILF2, ISG20L2, IVNS1ABP, KBTBD6, KBTBD7, KCTD3, KIAA033, KIAA0406, KIAA0460, KIAA0859, KIAA1219, KIF14, KIF21B, KIFAP3, KLHDC9, KLHL20, LPGAT1, LRIG2, LY6E, LY9, MANBAL, MAPBPIP, MEIS2, MET, MPHOSPH8, MRPL9, MRPS14, MRPS21, MRPS31, MSTO1, MTMR11, MYST3, NDUFS2, NEK2, NIT1, NME7, NOS1AP, NUCKS1, NUF2, NVL, OPN3, PBX1, PCM1, PEX19, PHF20L1, PI4 KB, PIGM, PLEC1, PMVK, POGK, POLR3C, PPM2C, PPOX, PRCC, PSMB4, PSMD4, PTDSS1, PUF60, PYCR2, RAB3GAP2, RALBP1, RASSF5, RBM8A, RCBTB1, RCOR3, RGS5, RIPK5, RNPEP, RRP15, RTF1, RWDD3, S100A10, SCAMP3, SCNM1, SDCCAG8, SDHC, SETDB1, SETDB2, SF3B4, SHC1, SNRPE, SP1, SPEF2, SPG7, SS18, STX6, SUGT1, TAGLN2, TARBP1, TARS2, TBCE, THEM4, TIMM17A, TIPRL, TMEM183A, TMEM9, TNKS, TOMM40L, TPM3, TPR, TRAF31P3, TRIM13, TRIM33, TSC22D1, UBAP2L, UBE2T, UCHL5, UCK2, UTP14C, VPS28, VPS36, VPS37A, VPS72, WBP4, WDR47, WDSOF1, YOD1, YWHAB, YWHAZ, ZFP41, ZMYM2, ZNF364, and ZNF687.
Furthermore, the method described herein may predict clinical outcome and survival of an individual, may be effective in selecting treatment for an individual suffering from a disease, may predict post-treatment relapse risk and survival of an individual, may correlate molecular classification of a disease with the genomic signature defining the risk groups, or a combination thereof. The molecular classification may be CD1 and may correlate with high-risk multiple myeloma genomic signature. The CD1 classification may comprise increased expression of MMSET, MAF/MAFB, PROLIFERATION signatures, or a combination thereof. Alternatively, the molecular classification may be CD2 and may correlate with low-risk multiple myeloma genomic signature. The CD2 classification may comprise HYPERDIPLOIDY, LOW BONE DISEASE, CCND1/CCND3 translocations, CD20 expression, or a combination thereof. Additionally, type of disease whose genomic signature is identified using such a method may include but is not limited to symptomatic multiple myeloma, or multiple myeloma.
In another embodiment of the present invention, there is provided a kit for the identification of genomic signatures linked to survival specific for a disease, comprising: DNA microarrays and written instructions for extracting nucleic acid from the plasma cells of an individual, and hybridizing the nucleic acid to DNA microarrays. The DNA microarrays in such a kit may comprise nucleic acid probes complementary to mRNA of genes mapping to chromosomes 1, 2, 3, 5, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, and 22, and may map to the p or q regions of these chromosomes. Examples of the genes may include but are not limited to those selected from the group consisting of AGL, AHCTF1, ALG14, ANKRD12, ANKRD15, APH1A, ARHGAP30, ARHGEF2, ARNT, ARPC5, ASAH1, ASPM, ATP8B2, B4GALT3, BCAS2, BLCAP, BOP1, C13orf1, C1orf107, C1orf112, C1orf19, C1orf2, C1orf21, C1orf56, C20orf43, C20orf67, C8orf30A, C8orf40, CACYBP, CAPN2, CCT3, CD48, CD55, CDC42BPA, CDC42SE1, CENPF, CENPL, CEP170, CEPT1, CHD1L, CKS1B, CLCC1, CLK2, CNOT7, COG3, COG6, CREB3L4, CSPP1, CTSK, CYC1, DAP3, DARS2, DBNDD2, DDR2, DEDD, DENND2D, DHRS12, DIS3, DNAJC15, EDEM3, EIF2C2, ELAVL1, ELF1, ELK4, ELL2, ENSA, ENY2, EXOSC4, EYA1, FAF1, FAIM3, FAM20B, FAM49B, FBXL6, FDPS, FLAD1, FLJ10769, FNDC3A, FOXO1, GLRX, GNAI3, GON4L, GPATCH4, GPR89B, HBXIP, IARS2, IL6R, ILF2, ISG20L2, IVNS1ABP, KBTBD6, KBTBD7, KCTD3, KIAA0133, KIAA0406, KIAA0460, KIAA0859, KIAA1219, KIF14, KIF21B, KIFAP3, KLHDC9, KLHL20, LPGAT1, LRIG2, LY6E, LY9, MANBAL, MAPBPIP, MEIS2, MET, MPHOSPH8, MRPL9, MRPS14, MRPS21, MRPS31, MSTO1, MTMR11, MYST3, NDUFS2, NEK2, NIT1, NME7, NOS1AP, NUCKS1, NUF2, NVL, OPN3, PBX1, PCM1, PEX19, PHF20L1, PI4KB, PIGM, PLEC1, PMVK, POGK, POLR3C, PPM2C, PPOX, PRCC, PSMB4, PSMD4, PTDSS1, PUF60, PYCR2, RAB3GAP2, RALBP1, RASSF5, RBM8A, RCBTB1, RCOR3, RGS5, RIPK5, RNPEP, RRP15, RTF1, RWDD3, S100A10, SCAMP3, SCNM1, SDCCAG8, SDHC, SETDB1, SETDB2, SF3B4, SHC1, SNRPE, SP1, SPEF2, SPG7, SS18, STX6, SUGT1, TAGLN2, TARBP1, TARS2, TBCE, THEM4, TIMM17A, TIPRL, TMEM183A, TMEM9, TNKS, TOMM40L, TPM3, TPR, TRAF31P3, TRIM13, TRIM33, TSC22D1, UBAP2L, UBE2T, UCHL5, UCK2, UTP14C, VPS28, VPS36, VPS37A, VPS72, WBP4, WDR47, WDSOF1, YOD1, YWHAB, YWHAZ, ZFP41, ZMYM2, ZNF364, and ZNF687.
Additionally, the disease for which the kit is used may include but is not limited to asymptomatic multiple myeloma, symptomatic multiple myeloma, multiple myeloma, recurrent multiple myeloma or a combination thereof.
As used herein, the term, “a” or “an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word “comprising”, the words “a” or “an” may mean one or more than one. As used herein “another” or “other” may mean at least a second or more of the same or different claim element or components thereof.
The following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the present invention in any fashion. One skilled in the art will appreciate readily that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those objects, ends and advantages inherent herein. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.
Bone marrow aspirates were obtained from 92 newly diagnosed multiple myeloma patients who were subsequently treated on National Institutes of Health-sponsored clinical trials. The treatment protocol utilized induction regimens followed by melphalan-based tandem peripheral blood stem cell autotransplants, consolidation chemotherapy, and maintenance treatment (Barlogie et al, 2006). Patients provided samples under Institutional Review Board-approved informed consent and records are kept on file. Multiple myeloma plasma cells (PC) were isolated from heparinized bone marrow aspirates using CD138-based immunomagnetic bead selection using the Miltenyi AUTOMACS™ device (Miltenyi, Bergisch Gladbach, Germany) as previously described (Zhan et al, 2002).
High molecular weight genomic DNA was isolated from aliquots of CD138-enriched plasma cells using the QIAMP® DNA Mini Kit (Qiagen Sciences, Germantown, Md.). Tumor and gender-matched reference genomic DNA (Promega, Madison, Wis.) was hybridized to Agilent 244K arrays using the manufacturer's instructions (Agilent, Santa Clara, Calif.).
Copy number changes in multiple myeloma plasma cells were detected using triple color interphase fluorescent in situ hybridization (FISH) analyses of chromosome loci as described (Shaughnessy et al, 2000). Bacterial artificial chromosomes (BAC) clones specific for 13q14 (D13S31), 1q21 (CKS1B), 1p13 (AHCYL1) and 11q13 (CCND1) were obtained from BACPAC Resources Center (Oakland, Calif.) and labeled with Spectrum Red- or Spectrum Green-conjugated nucleotides via nick translation (Vysis, Downers Grove, Ill.).
RNA purification, cDNA synthesis, cRNA preparation, and hybridization to the Human Genome U95AV2 and U133PLUS2.0 GENECHIP® microarrays (Affymetrix, Santa Clara, Calif.) were performed as previously described (Shaughnessy et al, 2007; Zhan et al, 2006; Zhan et al, 2007).
Array comparative genomic hybridization (aCGH) data was normalized by a modified Lowess algorithm (Yang et al, 2002). Statistically altered regions were identified using circular binary segmentation (CBS) algorithm (Venkatraman and Olshen, 2007). ‘Atom region (AR)’ was defined by applying Pearson's correlation coefficient between the signals from adjacent probes. Given the fact that genomic instability is a dynamic process we defined the strength of the DNA breakpoints as being related to the proportion of cases within the cohort and the percentage of tumor cells within a given case as having a given breakpoint. The significance of breakpoint was defined as R=1−correlation coefficient. We considered strong breakpoints (high percentage of cases and high percentage of cells within those cases having a breakpoint) to have an R>=0.4. RMA (Irizarry et al, 2003) package in R was used to perform summarization, normalization of Affymetrix GENECHIP® U133PLUS2.0 expression data. Significant association with outcome was determined using log-rank test for survival. Hazard ratio was calculated using the Cox proportional model. A multivariate survival analysis was applied for selecting independent features that are most significantly associated with outcome. All statistical analyses were performed using the statistics software R (Version 2.6.2), which is free available from http://www.r-project.org, and R packages developed by BioConductor project, which is free available from http://www.bioconductor.org. A detailed description of methods of data analysis are presented in Examples 6-13. We also utilized two additional public gene expression microarray datasets to further validate our findings. The two datasets represent 340 newly diagnosed multiple myeloma patients enrolled in Total Therapy 2 and 206 newly diagnosed multiple myeloma patients in Total Therapy 3 trial, respectively (Shaughnessy et al, 2007). The datasets can be downloaded from NIH GEO using accession number GSE2658. The array comparative genomic hybridization data and gene expression data generated on the 92 cases described here can be downloaded from the Donna D. and Donald M. Lambert Laboratory of Myeloma Genetics website at http://myeloma.uams.edu/lambertlab/software.asp, ftp://ftp.mirt.uams.edu/download/data/aCGH.
The purpose of within-array normalization is to eliminate systematic bias introduced by inherent properties of the use of different fluorophores and different concentrations of DNA samples in two-channel microarray platform. We applied the Loess algorithm to normalize raw array comparative genomic hybridization data (Pinkel and Albertson, 2005a), which will calculate an estimated log-ratio of the Cy5 channel to the Cy3 channel. The log-ratio indicates the extent of different DNA concentrations between test and reference DNAs. Although according to our experience, the Loess normalization method is robust in most cases, we did find substantial biased signals after Loess normalization. This might be due to the fact that there are too many genomic alterations in myeloma plasma cells and that the alterations are significantly asymmetric (much more DNA gains than DNA losses). So we introduced a heuristic process to account for this issue after obtaining the Loess normalized signals.
We next characterized each chromosome with two features, median and median absolute deviation (MAD) of signals within. We used median and median absolute deviation instead of mean and variance to increase robustness. Median absolute deviation is defined as MAD(S)=median (|si−median(s)|), where si represents the signal of probe i.
Second, we excluded chromosomes 3, 5, 7, 9, 11, which typically exhibit whole chromosome gains and the two sex chromosomes. We then applied K-means clustering using those two features to classify all other chromosomes into four subgroups: gain, loss, normal and outlier. Since most chromosomes for K-means should not exhibit gains or losses, the groups with the biggest size would be regarded as normal chromosomes.
Third, the median and median absolute deviation of all signals in normal chromosomes was calculated. After subtracting the median from all signals on an array, we then obtain within-array normalized signals.
We frequently observed substantial scale differences between microarrays. The differences may come from changes in the photomultiplier tube settings of the scanner or for other reasons not determined (Pinkel and Albertson, 2005a). With this in mind it is necessary to normalize signals between arrays. We therefore transformed the data to guarantee that every array is on the same scale. The calculation used was:
s
i
scaled=(si−median(s))/MAD(s)
where si represents the within-array normalized signal of probe i.
Segmentation served two purposes: identifying breakpoints and denoising the signal by averaging those within a constant region. We applied a circular binary segmentation (CBS) algorithm developed by Olshen and Venkatraman (Pinkel and Albertson, 2005b), to segment whole chromosomes into contiguous segments such that all DNA within a single segment had the same content. In brief, the algorithm cut a given DNA segment (whole chromosome in the first step) into two or three sub-segments (algorithm automatically decides two or three) and checks whether a middle segment exists that has a different mean value from that of the two flanking segments. If true, the cut points that maximize the difference were determined and the procedure was applied recursively to identify all breakpoints.
An ‘atom region’ (AR) is a contiguous stretch of DNA flanked by genomic breakpoints in plasma cells from all myeloma cases. The following is the procedure used for defining ARs: We calculated the Pearson's correlation coefficient (cc) of a probe and its neighboring probes and set the correlation coefficient of first point of each chromosome as 0. (For robustness, the top and bottom 1% were excluded from the cc calculation.) Set points with correlation coefficient smaller than a given cut-off were determined to be “0 point” or if greater than the cut-off, “1 point”. All “0 points” and the following no-gap “1 points” were merged into an atom region.
The concept of atom region has both technical and biological advantages. A technical advantage is it reduces dimensionality, from 244 k probes to ˜40 k or fewer atom regions, to facilitate analyses. Atom regions are different from minimal common regions in that they are defined at the level of the individual, while an atom region is defined at the population level. As such it is more appropriate for use in studying properties within populations, e.g. the distribution of copy number changes of a region in samples and its correlation with other regions. Atom region also helps to more precisely define the recurrent breakpoints. It is common in array comparative genomic hybridization data that signals from two different probes can overlap. Due to this noise, breakpoints are often hard to precisely define. The current method determines which atom region the probe belongs to by simultaneously considering signals of adjacent probes in the whole population, thus dramatically boosting the ability to precisely identify joint probes with high confidence. From a biological perspective the atom region might be a natural structural element of chromosome. Understanding atom regions in multiple myeloma and other cancers may help understand why many breakpoints in cancer cells appear to be so consistent, are atom regions in cancer similar to haplotype blocks in the germline; the concept of fragile sites; and the mechanism of genome instability, and evolution of genome instability.
Cox proportional hazards regression model was used to fit model to data. The procedure is as follow: Step 1. All one-variable models were fitted. The one variable with the highest significance (smallest P value) was selected if the P value of its coefficient was <0.25. Step 2. A stepwise program search through the remaining independent variables for the best N-variable model was achieved by adding each variable one by one into the previous (N−1)-variable model. The variable with highest adjusted significance was selected if the adjusted P value of its coefficient was <0.25. Step 3. We then went back and checked all variables in the model. If any variable had an adjusted P value>0.1, the variable was removed. Step 4. We repeated steps 2 and 3 until no more variables could be added.
For each gene, the Pearson's correlation coefficient between its expression levels and DNA copy numbers of its corresponding genome locus was calculated.
To determine the level of significance of the correlations, the sample labels of 92 patients were randomly shuffled, and then a new correlation coefficient was calculated for each gene. Repeating the shuffling 1000 times, 1000 different correlation coefficients were acquired for each gene, and then the level of significance was determined at the 95th percentile of the 1000 random correlation coefficients.
All analyses were based on human genome sequence National Center for Biotechnology Information (NCBI) build 35 (hg17). The positions of human microRNAs were taken from miRBase (http://microrna.sanger.ac.uk/sequences/). The positions of fragile sites were taken from NCBI gene database (http://www.ncbi.nlm.nih.gov/sites/entrez). The positions of segmental duplications, centromeres and telomeres were taken from University of California at Santa Cruz (UCSC) genome browser. The web tool, LiftOver (http://genome.ucsc.edu/cgi-bin/hgLiftOver), was used to convert genome coordinates from other assemblies (e.g. hg18) to hg17 when necessary.
Gene ontology classifies genes into different categories according to their attributes, such as functions, procedures involved and locations within cells. The categories are described using a controlled vocabulary. Gene ontology annotations for human genes were downloaded from NCBI gene database (ftp://ftp.ncbi.nih.gov/gene/DATA). The extent of associations of gene sets and gene ontology terms were calculated using Fisher's Exact test.
While oligonucleotide-based array comparative genomic hybridization offers a high resolution, it often suffers from high noise (Ylstra et al, 2006). Inappropriate means to adjust for noise in array comparative genomic hybridization raw data often leads to incorrect overall results. To increase signal-to-noise ratios, we applied a pre-processing procedure including supervised normalization and automatic segmentation algorithms. A Lowess normalization method (Yang et al, 2002) was first used to normalize the two-color intensities and to calculate log-ratio signal of the multiple myeloma cell DNA signal and normal reference DNA signal within each array. Since so many DNA regions are amplified in so many multiple myeloma samples, Lowess often under-estimated the overall signals. We therefore introduced a second step of supervised normalization to overcome this issue. In this step, a K-means clustering was applied to identify the normal chromosomal regions with minimal alterations. The signals in these “normal” regions were scaled to a distribution with 0 mean and 1 variance (see Example 6 for details). After normalization and before moving forward, we performed fluorescent in situ hybridization experiments to validate the pre-processed array comparative genomic hybridization signals, which were fundamental for all the subsequent analysis and inferences. We selected 50 cases to investigate three chromosomal regions, 1q21, 11q13 and 13q14, which frequently undergo copy number changes in multiple myeloma. By comparing the pre-process array comparative genomic hybridization signal to fluorescent in situ hybridization results, we confirmed that the array comparative genomic hybridization signal is consistent with fluorescent in situ hybridization results with correlation coefficient 0.76±0.08. Finally, we applied a circular binary segmentation (CBS) algorithm (Venkatraman and Olshen, 2007) to segment whole chromosomes into contiguous segments such that all DNA probes within a single segment have the same signal. The segmentation step further reduced the noise in the signals by averaging signals within a constant region.
The pre-processed signals contains redundant information and the exact break point between two continuous segments is hard to precisely defined due to frequent overlap in the distribution of signals in the two segments. With this in mind, we introduced a concept of ‘atom region’ (AR) in chromosomes. An atom region is a contiguous region of DNA that is always lost or gained together in the tumor samples. We applied a simple Pearson's correlation-based method to identify atom regions (see Example 9). In brief, for any two continuous array comparative genomic hybridization probes, if the correlation coefficient of their pre-processed signals across samples is greater than a given cutoff value (we used a strict cutoff of 0.99), the two will be grouped together into an atom region. This method defined 18,506 atom regions across the entire multiple myeloma genome. Of note, the atom regions defined here were solely based on statistical analysis. Many of them might come from noise in the data instead of a true break point in terms of biology. Although so, we preferred to performing the following analysis based on these atom regions since they contained the most complete information and are flexible whenever a less strict cutoff required.
We first evaluated the overall copy number abnormalities in multiple myeloma cells from 92 patients (
Using global gene expression profiling, we have previously shown that multiple myeloma can be divided into seven distinct molecular classes of disease (Zhan et al, 2006; Bergsagel et al, 2001). Four of the classes are associated with known recurrent IGH-mediated translocations. The t(4; 14), activating FGFR3 and MMSET/WHSC1, make up the MS subtype. The t(11; 14) and t(6; 14) activating CCND1 or CCND3 genes, respectively, make up the CD-1 subtype or CD-2 subtype when also expressing CD20. The t(14; 16) and t(14; 20) activating MAF or MAFB, respectively, make up the MF subtype. A group associated with elevated expression of genes mapping to chromosomes 3, 5, 7, 9, 11, 15, and 19 and lacking translocation spikes makes up the hyperdiploid (HY) subtype. A novel disease class with low bone disease with no recognizable genomic features and a unique gene expression signature makes up the low bone disease (LB) subtype. Elevated proliferation genes comprised of cases from each of the other subtypes was also identified and called the PR subtype (Zhan et al, 2006; Begsagel et al, 2001). Evaluation of copy number abnormalities across the seven validated molecular classes revealed expected and unexpected findings (refer to
To identify disease-related copy number abnormalities, or so-called driver copy number abnormalities, we integrated array comparative genomic hybidization data and clinical information and applied survival analysis to every atom region. There were a total of 2,929 atom regions involving a ˜416 Mb DNA sequence that was significantly associated with outcome P<0.01 (
Clinically seemingly irrelevant copy number abnormalities regions may be considered passenger mutations reflecting a general genomic instability in multiple myeloma or corresponding to benign copy number variations (CNVs) within the human population (Zhang et al, 2006). The term “copy number variation” was used here to distinguish copy number alteration defined within the general human population from copy number abnormalities detected in multiple myeloma patients. Ideally, germline genomic DNA corresponding to each tumor sample would be used as the reference DNA. In lieu of such, we compared the multiple myeloma-defined atom regions to known copy number variations in the normal human population (Zhang et al, 2006). Results revealed that 7443 multiple myeloma atom regions have corresponding copy number variations in the normal population. We then compared the multiple myeloma atom regions overlapping (CNV-ARs) to those not overlapping with normal copy number variations (non-CNV-ARs), among which the latter were more likely to be associated with outcome (p=0.012, one-side Kolmogorov-Smimov test) (
We next investigated whether the size of copy number abnormalities resulting in gains and losses was associated with prognosis. According to class designations associated with poor outcome (class 1, increased copy number; class 2, loss of copy number), the ratios of DNA length in class 1 and class 2 copy number abnormalities were 206 Mb:171 Mb, 101 Mb:31 Mb and 5 Mb:0 Mb, respectively, when applying different significance levels of 0.01, 0.001 and 5.4E-07. These results indicate that class 1 copy number abnormalities were larger than class 2 copy number abnormalities, generally suggesting that increases in copy number appear to be more relevant to poor outcome than loss of DNA.
Clinical outcomes could be distinguished on the basis of gene expression profiling-derived proliferation index and risk index values. When examined in the context of copy number abnormalities, loss of 1p and gains of 1q were most significantly correlated with both high proliferation index and high-risk index. Thus, the top 100 copy number abnormalities positively and negatively correlated with the risk index were located in 1p and lq (
We next evaluated the relationship between the position of copy number abnormality breakpoints and known chromosome-structural features such as segmental duplications, centromeres, and telomeres. The results revealed that copy number abnormality breakpoints were most significantly associated with segmental duplications and centromeres (Table 1). In contrast to “weak breakpoints”, those seen in a high percentage of cases and, within cases, in a high percentage of tumor cells (“strong breakpoints”), were not found in telomeric regions. We take these data to suggest that breakpoints near telomeres tend to not confer a selective proliferative advantage. We further investigated the correlation between known fragile sites, another potential link to chromosome instability, and copy number abnormality breakpoints. Since most fragile sites are not precisely mapped in the genome, we compared the distribution of copy number abnormality breakpoints in every chromosome cytoband. The results of application of the Kolmogorov-Smirnov test strongly suggested that fragile sites and copy number abnormality breakpoints in multiple myeloma are not associated (Table 1).
Although the majority of copy number abnormality breakpoints were found in intergenic regions (Table 1), strong breakpoints (those found in a significant number of cases and within a significant number of cells within a case) within genes were identified and might point to important disease-related genes. A list of recurrent breakpoints and corresponding genes in which strong breakpoints were identified is provided (Table 2). Given that plasma cells are late stage B-cells that have undergone chromosomal rearrangements in both heavy and light chain immunoglobulin genes, it is noteworthy that our method of identifying gene centric breakpoints revealed hits in the IGH, IGK and IGL loci (Table 2). The ability to identify expected breakpoints in the immunoglobulin loci provides strong evidence that recurrent breakpoints in genes outside the immunoglobulin loci may point to important candidate disease genes. Actual determination of their relevance will require further studies.
break28
2
88968794
89003124
3′
—
overlap
—
with
—
5′
IGK@
break28
2
88968794
89003124
3′
—
overlap
—
with
—
5′
IGKC
break28
2
88968794
89003124
3′
—
overlap
—
with
—
5′
IGKV1-5
break28
2
88968794
89003124
3′
—
overlap
—
with
—
5′
IGKV2-24
break29
2
89159181
89162648
beloag
—
to
IGK@
break29
2
89159181
89162648
belong
—
to
IGKC
break29
2
89159181
89162648
belong
—
to
IGKV1-5
break29
2
89159181
89162648
belong
—
to
IGKV2-24
break127
14
105280523
105286479
belong
—
to
IGH@
break127
14
105280523
105286479
belong
—
to
IGHA1
break127
14
105280523
105286479
belong
—
to
IGHG1
break128
14
105330913
105343150
belong
—
to
IGH@
break128
14
105330913
105343150
belong
—
to
IGHA1
break128
14
105330913
105343150
belong
—
to
IGHG1
break129
14
105630089
105643293
belong
—
to
IGHA1
break129
14
105630089
105643293
belong
—
to
IGHG1
break151
22
21563415
21570383
5′
—
overlap
—
with
—
3′
IGL@
break151
22
21563415
21570383
belong
—
to
IGL@
break151
22
21563415
21570383
5′
—
overlap
—
with
—
3′
IGLJ3
break151
22
21563415
21570383
5′
—
overlap
—
with
—
3′
IGLV3-25
break151
22
21563415
21570383
belong
—
to
IGLV3-25
break151
22
21563415
21570383
belong
—
to
IGLV4-3
We investigated break points with significance>0.4 (correlation coefficient<0.6) for their location within genes. Bold breakpoints and genes indicate immunoglobulin genes on chromosome 2, 14, and 22.
Since we cannot determine the exact position of a break point due to the limited resolution of the array comparative genomic hybridization platform, we use the gap between two adjacent probes, in which a break point was located, to represent the break point. Relationship definitions are as follows: “belongs_to” means a break point-associated region is within a gene; “contain” means a break point-associated region contains an entire gene; “5′_overlaps with 3′” means the 5′ end of a break point-associated region overlaps with the 3′ of a gene; “3′_overlaps with—5′” means the 3′ end of a break point-associated region overlaps with the 5′ of a gene.
MicroRNAs (miRNAs) are a novel class of small non-coding RNAs that play important roles in development and differentiation by regulating gene expression through repression of mRNA translation or promoting the degradation of mRNA. Emerging evidence has revealed that deregulated expression of miRNAs is implicated in tumorigenesis. Importantly, for purposes of the current study, recent studies have demonstrated that miRNAs reside in the genome affected by copy number abnormalities (Calin and Croce, 2006; Calin and Croce, 2007).
To investigate copy number abnormalities that might target miRNAs, we first determined the chromosomal distribution of miRNAs across the entire human genome. It is interesting to note that more miRNAs are located on odd chromosomes (N=268), which typically exhibit trisomies in hyperdiploid multiple myeloma, than on even chromosomes (N=179) (Table 3). We next investigated whether miRNAs are enriched in regions exhibiting copy number abnormalities in multiple myeloma (Table 4). These data revealed that miRNAs are indeed enriched in copy number abnormalities exhibiting gains and losses but that miRNAs were also enriched in copy number abnormalities significantly associated with outcome (Table 5). These data suggests that miRNAs might be targets of copy number abnormalities in multiple myeloma.
By combining copy number abnormalities, gene expression data, and survival information we next investigated disease progression-related regions/genes. A stepwise multivariate survival analysis was performed to identify 14 atom regions from 587 atom regions with an optimal log-rank P-value<0.0001 (Table 6). For each atom region/gene, we selected an optimal cut-off value to separate 92 cases into two groups, performed log-rank tests and employed Cox proportional hazard models to compare differences in survival time of the two groups. The optimal cut-off value was selected by walking along all value points such that we identified the value that gave the smallest P-value in a log-rank test. We knew that while the optimized P-value used here minimized false negatives, the false positives would be greatly enhanced. However, this tradeoff was deemed acceptable since false positives would be filtered when copy number abnormalities data was integrated with the gene expression results. Potential candidate genes were defined by the following criteria: 1) gene expression had to be associated with outcome (P<0.01); 2) the copy number of its locus had to be associated with outcome (P<0.01); and 3) the correlation co-efficient of the gene expression and the copy number of its genomic locus had to be greater than 0.3, which was determined by a re-sampling procedure on sample labels (see Examples 5-13). Using these criteria we discovered a list of 210 genes (Table 7). According to Gene Ontology analysis these genes are enriched in those whose protein products are involved in rRNA processing, RNA splicing, epidermal growth factor receptor signaling pathway, the ubiquitin-dependent proteasomal-mediated protein catabolic process, mRNA transport, phospholipid biosynthesis, protein targeting to mitochondria, and cell cycle (P<0.01). Remarkably, 122 of the 210 genes are located on 1q region, providing further support for a central role of 1q21 gains in multiple myeloma pathogenesis. In addition, we found 21 genes located on chromosome 13, and 17 of them located in band 13q14. This analysis identified copy number abnormalities and copy number abnormalities resident copy number sensitive genes related to survival in multiple myeloma that represent candidate disease genes.
One of the 210 candidate genes, EIF2C2/AGO2, is of high interest since it is a protein that binds to miRNAs, and by corollary, mRNA translation and/or mRNA degradation (Liu et al, 2004), and an additional function of regulating the products of mature miRNAs (O'Carroll et al, 2007; Diederichs and Haber, 2007). Importantly, recent studies have revealed that EIF2C2/AGO2 plays an essential function in B-cell differentiation (O'Carroll et al, 2007, Martinez et al, 2007). EIF2C2/AGO2 is represented by five probes on our Agilent 244K array comparative genomic hybridization platform, which are all located in the same atom region. While EIF2C2/AGO2 also has six probes on the Affymetrix U133PLUS2.0 GENECHIP®, only one probe, 225827_at maps exactly to exons of EIF2C2/AGO2 according to National Center for Biotechnology Information gene database and this probe was used to evaluate expression of EIF2C2/AGO2. The correlation co-efficient of DNA copy number and expression level of EIF2C2/AGO2 was 0.304. The optimized P-value of a log-rank test was 0.00035 and 0.00068 for array comparative genomic hybridization and gene expression data, respectively (
The following references are cited herein:
This is a continuation-in-part of U.S. Ser. No. 11/983,113, filed Nov. 7, 2007, which claims benefit of provisional patent application 60/857,456, filed Nov. 7, 2006, now abandoned.
This invention was created, in part, using funds from the federal government under National Cancer Institute grant CA55819 and CA97513. Consequently, the U.S. government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
60857456 | Nov 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11983113 | Nov 2007 | US |
Child | 12148985 | US |