The present invention generally relates to the field of cancer research. More specifically, the present invention relates to the gene expression profiling of cytogenetic abnormalities.
Multiple myeloma (MM) is an invariantly fatal tumor of terminally differentiated plasma cells (PCs) that home to and expand in the bone marrow. Monoclonal gammopathy of undetermined significance (MGUS) and multiple myeloma are the most frequent forms of monoclonal gammopathies. Monoclonal gammopathy of undetermined significance is the most common plasma cell dyspraxia with an incidence of up to 10% of population over age 75. The molecular basis of monoclonal gammopathy of undetermined significance and multiple myeloma are not very well understood and it is not easy to differentiate these two disorders. Diagnosis of multiple myeloma or monoclonal gammopathy of undetermined significance is identical in ⅔ of cases using classification systems that are based on a combination of clinical criteria such as the amount of bone marrow plasmocytosis, the concentration of monoclonal immunoglobulin in urine or serum, and the presence of bone lesions. Especially in early phases of multiple myeloma, differential diagnosis is associated with a certain degree of uncertainty.
Furthermore, in the diagnosis of multiple myeloma, the clinician must exclude other disorders in which a plasma cell reaction may occur. These other disorders include rheumatoid arthritis, connective tissue disorders, and metastatic carcinoma where the patient may have osteolytic lesions associated with bone metastases. Therefore, given that multiple myeloma is thought to have an extended latency and clinical features are recognized many years after development of the malignancy, new molecular diagnostic techniques are needed for differential diagnosis of multiple myeloma, e.g., monoclonal gammopathy of undetermined significance versus multiple myeloma, or recognition of various subtypes of multiple myeloma.
Multiple myeloma initially resides in the bone marrow, but typically transform into an aggressive disease with increased proliferation (resulting in a higher frequency of abnormal metaphase karyotypes), elevated lactate dehydrogenase (LDH) and extramedullary manifestations (Barlogie B. et al., 2001). Although aneuploidy is observed in more than 90% of cases, cytogenetic abnormalities in this typically hypoproliferative tumor are informative in only about 30% of cases and are typically complex, involving on average seven different chromosomes.
Given this genetic chaos, it has been difficult to establish correlations between genetic abnormalities and clinical outcomes. Only recently has chromosome 13 deletion been identified as a distinct clinical entity with a grave prognosis. However, even with the most comprehensive analysis of laboratory parameters, such as b2-microglobulin (b2M), C-reactive protein (CRP), plasma cell labeling index (PCLI), metaphase karyotyping, and fluorescence in situ hybridization (FISH), the clinical course of patients afflicted with multiple myeloma can only be approximated, because no more than 20% of the clinical heterogeneity can be accounted for. Thus, there are distinct clinical subgroups of multiple myeloma and modern molecular tests may identify these entities. Overall, the progress in understanding the biology and genetics of multiple myeloma has been slow.
The prior art is deficient in correlating gene expression profiling methods to determining cytogenetic abnormalities in a subject, including methods that do not rely on fluorescent in situ hybridization (FISH), which is the current standard in the art for detecting chromosomal abnormalities. The present invention fulfills this need in the art.
The present invention provides, inter alia, methods and systems for predicting cytogenetic abnormalities (e.g., chromosomal abnormalities) associated with a cancer in a subject. These methods and systems substitute for FISH (fluorescent in situ hybridization), which is the current standard technique in the art for detecting chromosomal abnormalities. Therefore, while in some embodiments the methods provided by the invention may further provide for detecting a chromosomal abnormality by FISH (e.g. by initial diagnosis before confirmation and/or further testing by the methods and systems provided by the invention or by follow-on testing, following testing by the methods and systems provided by the invention), in certain embodiments, the methods and systems provided by the invention are performed or used without FISH. In a preferred embodiment, the methods and systems provided by the invention are performed or used without FISH.
The methods provided by the invention comprise, in certain embodiments, importing gene expression values obtained from a global gene expression profile of mRNA from cells associated with the cancer into a cytogenetic abnormalities model and predicting, with the model, genes expressing cytogenetic abnormalities in the subject.
The present invention also provides methods for predicting cytogenetic abnormalities in a subject having or at risk for multiple myeloma. The method comprises importing gene expression values obtained from a global gene expression profile of mRNA from plasma cells obtained from the subject into a cytogenetic abnormalities model of a set of reference values of copy number-sensitive genes that correlate to cytogenetic abnormalities associated with multiple myeloma. Using the reference model, genes exhibiting cytogenetic abnormalities in the subject are predicted.
The present invention further provides methods for predicting cytogenetic abnormalities in a subject having or at risk for multiple myeloma. The methods comprise performing global gene expression profiling on mRNA extracted from plasma cells from the subject. Gene expression values obtained from the profile based on copy number-sensitive genes are averaged to reference values correlating to cytogenetic abnormalities associated with (the cancer found in) multiple myeloma. The correlative values of cytogenetic abnormalities comprise a cytogenetic abnormalities model and, thereby, cytogenetic abnormalities in the subject are predicted.
The present invention further still provides computer-readable media tangibly (e.g., non-transiently) storing a virtual model of cytogenetic abnormalities associated with multiple myeloma and implementable in a computer system having a memory, a processor and at least one network connection. The virtual model comprises a list of genes shown in Table 1 identified from global expression profiling of plasma cell mRNA obtained from control multiple myeloma patients, a set of reference values in Table 2 that are averages of the expression values based on copy number-sensitive genes that correlate to cytogenetic abnormalities associated with multiple myeloma; a statistical function to average the gene expression values. The computer-readable medium also tangibly stores program instructions to implement the virtual model in the computer system.
The present invention further still provides methods for predicting cytogenetic abnormalities in a subject having multiple myeloma. The method comprises applying the virtual cytogenetic abnormalities model, comprising the list of genes in Table 1, the reference values in Table 2, the statistical averaging function, and the program instructions of the computer readable medium as described supra in a computer system to average the gene expression values obtained from global expression profiling of mRNA from plasma cells of a subject having multiple myeloma to reference values correlating to cytogenetic abnormalities in multiple myeloma, thereby predicting cytogenetic abnormalities in the subject.
Other and further aspects, features, and advantages of the present invention will be apparent from the following description of the presently preferred embodiments of the invention. These embodiments are given for the purpose of disclosure.
The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the present invention.
So that the matter in which the above-recited features, advantages and objects of the invention, as well as others which will become clear, are attained and can be understood in detail, more particular descriptions and certain embodiments of the invention briefly summarized above are illustrated in the appended drawings. These drawings form a part of the specification. It is to be noted, however, that the appended drawings illustrate preferred embodiments of the invention and therefore are not to be considered limiting in their scope.
A description of example embodiments of the invention follows.
As used herein, the following terms and phrases shall have the meanings set forth below. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art.
As used herein, the term, “a” or “an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word “comprising”, the words “a” or “an” may mean one or more than one. As used herein “another” or “other” may mean at least a second or more of the same or different claim element or components thereof. The terms “comprise” and “comprising” are used in the inclusive, open sense, meaning that additional elements may be included.
As used herein, the term “or” in the claims refers to “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or”.
As used herein, the term “about” refers to a numeric value, including, for example, whole numbers, fractions, and percentages, whether or not explicitly indicated. The term “about” generally refers to a range of numerical values (e.g., +/−5-10% of the recited value) that one of ordinary skill in the art would consider equivalent to the recited value (e.g., having the same function or result). In some instances, the term “about” may include numerical values that are rounded to the nearest significant figure.
Threshold values “substantially similar” to those in Table 2 are within 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1%—in either direction—of the values in Table 2.
“GEP-17,” “GEP-70,” and “GEP-80” are gene expression profiles that are diagnostic and/or prognostic of multiple myeloma and are described more fully in, for example, U.S. Patent Application Publication No. US 2008/0187930, which is incorporated by reference in its entirety, including Table 1 (which provides the GEP-70 signature) and Table 7 (which provides the GEP-17 signature) as well as U.S. Patent Application Publication No. US 2012/0015906, which is incorporated by reference in its entirety, including Table 2. These gene expression profiles may, in certain embodiments, be used in the methods provided by the invention to further characterize a subject, e.g., by diagnosing or further prognosing the subject, in addition to the virtual karyotyping provided by the invention. Additional gene expression profiles for use in this way in the methods provided by the invention include, for example, the 15 gene signature described in U.S. Pat. No. 7,371,736, which is incorporated by reference in its entirety, including Example 12, which describes the 15 gene signature in greater detail.
As used herein, the terms “subject”, “individual” or “patient” refers to a mammal, preferably a human, who has, is suspected of having or at risk for having a pathophysiological condition, for example, but not limited to, multiple myeloma.
As noted above, the invention provides methods and systems for detecting, e.g., chromosomal abnormalities—without FISH, the current state of the art—by virtual karyotyping. These methods and systems utilize the gene expression levels of a set of the copy number sensitive genes of Table 1 located in a chromosomal region suspected of containing a cytogenetic abnormality selected from a gain of chr1q, chr3, chr5, chr7, chr9, chr11, chr15, chr19, or chr21; amplification of chr1q21; or loss of chr1p, chr6q, or chr13q. Thus, for example, to detect a gain of chr1q, a set of the genes listed in Table 1 that are located in region 1q are tested and/or evaluated for their gene expression levels in accordance with the methods provided by the invention. In particular embodiments, at least 10, 20, 30, 40, 50, 60, 70, 80, 90, or 95% of the genes in Table 1 for a given chromosomal region suspected of containing a cytogenetic abnormality are tested and/or evaluated. In other particular embodiments, the expression level of all of the genes in Table 1 for a given chromosomal region suspected of containing a cytogenetic abnormality are tested and/or evaluated.
In other embodiments, expression level of one or more of the genes in Table 9 for a given chromosomal region suspected of containing a cytogenetic abnormality are tested and/or evaluated. Table 9 is a subset of the genes in Table 1, more specifically, the top 10 copy number sensitive genes for the indicated region, ranked according to the correlation between gene expression levels and aCGH. In more particular embodiments, the expression level of at least 2, 3, 4, 5, 6, 7, 8, 9, or all 10 of the genes in Table 9 for a given chromosomal region are tested and/or evaluated. In other particular embodiments, the expression level of the top (by rank of the correlation coefficient in Table 9) 1, 2, 3, 4, or 5 genes in Table 9 for a given chromosomal region are tested and/or evaluated, e.g., the expression level of the top 1 or 2 genes in Table 9 for a given chromosomal region are tested and/or evaluated.
Of course, the methods provided by the invention allow for simultaneous testing for multiple cytogenetic abnormalities in parallel, e.g., one or more cytogenetic abnormalities selected from a gain of chr1q, chr3, chr5, chr7, chr9, chr11, chr15, chr19, or chr21; amplification of chr1q21; or loss of chr1p, chr6q, or chr13q—e.g., the subject can be assayed for the presence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or all 13 cytogenetic abnormalities in parallel. In certain embodiments, the uneven chromosomes are evaluated for the presence of cytogenetic abnormalities by the methods provided by the invention in parallel. In other embodiments, chr1p, chr1q, and chr6q are evaluated for the presence of cytogenetic abnormalities according to the methods provided by the invention in parallel. In still other embodiments, the uneven chromosomes and chr1p, chr1q, and chr6q are evaluated for the presence of cytogenetic abnormalities according to the methods provided by the invention in parallel.
In one embodiment of the present invention there is provided a method for predicting cytogenetic abnormalities associated with a cancer in a subject, comprising importing gene expression values obtained from a global gene expression profile of mRNA from cells associated with the cancer into a cytogenetic abnormalities model; and predicting, with the model, genes expressing cytogenetic abnormalities in the subject.
In this embodiment, the predicting step may comprise averaging the imported gene expression values based on copy number-sensitive genes to reference values correlating to cytogenetic abnormalities associated with the cancer. Further in this embodiment, the cytogenetic abnormalities model may be a virtual model tangibly stored on a computer-readable medium.
In one aspect of this embodiment, the cancer is multiple myeloma and the cytogenetic abnormalities model comprises a set of copy-numbers sensitive genes reference values correlating to cytogenetic abnormalities in Table 2. Particularly, in this aspect, the set of copy number-sensitive genes comprise the genes in Table 1. Furthermore, the reference values may distinguish among DNA amplification, DNA deletion and DNA with normal copy number.
In another embodiment of the present invention, there is provided a method for predicting cytogenetic abnormalities in a subject having or at risk for multiple myeloma, comprising importing gene expression values obtained from a global gene expression profile of mRNA from plasma cells obtained from the subject into a cytogenetic abnormalities model of a set of reference values of copy-numbers sensitive genes correlating to cytogenetic abnormalities associated with multiple myeloma; and predicting, with the reference model, genes exhibiting cytogenetic abnormalities in the subject.
In this embodiment, the copy number-sensitive genes comprise the genes in Table 1. Also, the reference values may comprise the values in Table 2. In addition, the cytogenetic abnormalities predicted by the model may be determinative of a prognosis of the subject having multiple myeloma or may be diagnostic of multiple myeloma in the subject. Furthermore, the reference values and the DNA amplification, deletion or normality represented by the same and the virtual cytogenetic abnormalities model are as described supra.
In yet another embodiment of the present invention, there is provided a method for predicting cytogenetic abnormalities in a subject having or at risk for multiple myeloma, comprising obtaining plasma cells from the subject; performing global gene expression profiling on mRNA extracted from the cells; averaging the gene expression values obtained from the profile based on copy number-sensitive genes to reference values correlating to cytogenetic abnormalities associated with (the cancer found in) multiple myeloma, said correlative values of cytogenetic abnormalities comprising a cytogenetic abnormalities model, thereby predicting cytogenetic abnormalities in the subject.
In this embodiment the copy number-sensitive genes in Table 1, the prognosis and/or diagnosis of multiple myeloma by the cytogenetic abnormalities model, the reference values in Table 2 and the DNA amplification, deletion or normality represented by the same and the virtual reference model are as described supra.
In yet another embodiment of the present invention, there is provided a computer-readable medium tangibly storing a virtual model of cytogenetic abnormalities associated with multiple myeloma and implementable in a computer system having a memory, a processor and at least one network connection, said virtual model comprising a list of genes shown in Table 1 identified from global expression profiling of plasma cell mRNA obtained from control multiple myeloma patients; a set of reference values in Table 2 that are averages of the expression values based on copy number-sensitive genes that correlate to cytogenetic abnormalities associated with multiple myeloma; a statistical function to average the gene expression values; and program instructions to implement the virtual model in the computer system.
In this embodiment, the program instructions may be adapted to receive inputted gene expression values obtained from global expression profiling of mRNA from plasma cells of a subject having multiple myeloma; average the received gene expression values based on copy numbers sensitive genes; and output a value predictive of cytogenetic abnormalities in the subject.
In yet another embodiment of the present invention there is provided a method for predicting cytogenetic abnormalities in a subject having multiple myeloma, comprising applying the virtual model and program instructions of the computer readable medium of claim 21 in a computer system to average the gene expression values obtained from global expression profiling of mRNA from plasma cells of a subject having multiple myeloma to reference values correlating to cytogenetic abnormalities in multiple myeloma, thereby predicting cytogenetic abnormalities in the subject.
Multiple myeloma, a neoplasm of plasma cells, is characterized by complex chromosomal abnormalities, including structural and numerical rearrangements. The cytogenetic abnormalities that are a hallmark of multiple myeloma and other cancers are commonly used as clinical parameters for determining disease stage and guiding therapy decisions for patients. Traditional cytogenetic techniques, including fluorescence in situ hybridization (FISH) and karyotyping, and the recently developed array-based comparative genomic hybridization (aCGH), are widely used to detect chromosomal aberrations and gene copy-number changes. These methods, however, are expensive or time-consuming, or both.
Thus, the present invention provides a virtual cytogenetic abnormalities (vCA) model or cytogenetic abnormalities reference model that uses gene expression profiling to predict cytogenetic abnormalities. The model has accuracy up to about 0.99. The rationale for the model is that disease-associated alterations of genomic regions should in some way alter (“drive”) expression levels of target genes within the regions or nearby; otherwise, the genomic alterations would be just “passengers” without a real contribution to the disease. Therefore, the driving alterations should be predictable via the alteration of expression levels of the genomic region's target genes. Thus, global gene expression profiling can be a one-stop data source for information on molecular diagnosis and/or prognosis, particularly yielding information from the level of specific genes to whole chromosomes for making a molecular diagnosis and/or determination of prognosis in multiple myeloma, as well as potentially other malignancies. Proper analysis of gene expression profiling data can reveal all the information provided by conventional cytogenetic techniques.
The reference model of cytogenetic abnormalities may be a virtual model provided in a computer comprising a computer system or other electronic device having one or more wired or wireless network connections, a memory to store the model and a processor to execute instructions enabling the reference model on the computer or other electronic device. Such computers and electronic devices are well-known and standard in the art. A computer storage medium may tangibly store the virtual reference model and instructions to implement the virtual model in the computer system. As such, the virtual reference model and instructions may comprise a computer program product tangibly stored in a memory on a computer or other computer storage device as are known in the art.
Particularly the virtual cytogenetic abnormalities model may comprise a list of genes identified from global gene expression profiling of mRNA obtained from a biological sample, for example, from plasma cells (e.g. CD138-enriched plasma cells) in the case of multiple myeloma, obtained from a control subjects having the cancer of interest. For example Table 1 provides a list of genes from a subject having multiple myeloma. The model also comprises a set of reference values that are averages of the expression values based on copy number-sensitive genes obtained from global expression profiling of the biological sample that correlate to cytogenetic abnormalities associated with the cancer. For example Table 2 provides these correlative values derived from Table 1. The virtual model also may comprise a statistical function, such as a function to average gene expression values inputted into the model, and the program instructions to implement the virtual model in the computer system.
While the examples provided herein utilize multiple myeloma cells, one of ordinary skill in the art can see that the methods and reference models provided herein are readily adapted to any pathophysiological condition associated with cytogenetic abnormalities during progression and/or remission of the condition. Global gene expression profiling (GEP), whole transcriptome shotgun sequencing (RNA-seq), fluorescent in situ hybridization (FISH), DNA isolation and array-based comparative genomic hybridization (aCGH) or high-throughput DNA sequencing, combining with the statistical analysis techniques provided herein are well-suited to identify copy number-sensitive genes that are associated with a pathophysiological condition, such as, but not limited to a cancer. For example, the reference model described herein can be configured for any cancer.
The following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the present invention in any fashion. One skilled in the art will appreciate readily that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those objects, ends and advantages inherent herein. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.
Bone marrow aspirates were obtained from patients newly diagnosed with multiple myeloma, who were subsequently treated on NIH-sponsored clinical trials. Patients provided samples under Institutional Review Board—approved informed consent, and records are kept on file. Myeloma plasma cells were isolated from heparinized bone marrow aspirates with an autoMACS device (Miltenyi Biotec, Inc., Auburn, Calif.) using CD138-based immunomagnetic bead selection, as previously described (Zhan, 2002).
DNA Isolation and Array-Based Comparative Genomic Hybridization (aCGH)
High-molecular-weight genomic DNA was isolated from aliquots of CD138-enriched plasma cells with the use of the QIAamp DNA mini kit (Qiagen, Valencia, Calif.). Tumor- and sex-matched reference genomic DNA (Promega Corp., Madison, Wis.) was hybridized to the Agilent 244K aCGH array according to the manufacturer's instructions (Agilent Technologies, Inc., Santa Clara, Calif.).
Bone marrow aspirates from patients with multiple myeloma were processed to remove erythrocytes. Copy-number changes in myeloma plasma cells were detected by triple-color interphase FISH analysis of chromosome loci, as described (Shaughnessy, 2000). Bacterial artificial chromosome (BAC) clones specific for 1q21 (CKS1B), 1p13 (AHCYL1), 13q14 (D13S31), and 13q34 (D13S285) were obtained from BACPAC Resources Center (Oakland, Calif.) and labeled with Spectrum Red- or Spectrum Green-conjugated nucleotides via nick translation (Vysis, Downers Grove, Ill.). At least 100 myeloma cells stained with immunoglobulin (Ig) light-chain antibody (kappa or lambda) conjugated with 7-amino-4-methylcoumarin-3-acetic acid (AMCA) were counted for copies of each probe. The threshold of significant abnormality (gain or loss) of each probe was set at ≧20%, as previously described (Shaughnessy et al. Blood, 15 Aug. 2000).
Bone marrow was processed for chromosome studies by standard techniques. A direct harvest, a 24-hour unsynchronized culture, and a 48-hour synchronized culture were employed on most specimens. The 24-hour culture employed the adding of ethidium bromide (10 μg/mL) to the culture 2 hours prior to harvest, with an additional 1 hour in Colcemid solution (0.05 μg/mL). The 48-hour synchronized cultures employed a 17-hour exposure of cells to 10-7 M methotrexate. Cells were washed with unsupplemented medium and then released with 10-5 M thymidine. Colcemid (0.05 μg/mL) was added 5 hours later for 1 hour. For the purpose of cytogenetic examination, an effort was made to examine at least 20 metaphases, with the application of Giemsa banding techniques. The presence of cytogenetic abnormalities required the detection of at least two abnormal metaphases in cases of hyperdiploidy and translocations, whereas at least three metaphases with clonal abnormalities were required in cases of whole and partial chromosome deletions.
RNA purification, cDNA synthesis, cRNA preparation, and hybridization to the Human Genome U133Plus 2.0 GeneChip microarray (Affymetrix, Santa Clara, Calif.) were performed as previously described (Zhan, 2006; Shaughnessy, 2007; Zhan, 2007).
A modified Lowess algorithm was used to normalize aCGH data (Yang, 2002). Statistically, altered regions were identified with the use of a circular binary segmentation algorithm (Yang, 2002). The MASS algorithm was used to summarize and normalize Affymetrix U133Plus2.0 expression data. All statistical analyses were performed with the statistics software R (version 2.6.2; available free of charge at www.r-project.org) and R packages developed by the BioConductor project (available free of charge at www.bioconductor.org).
DNA copy number-sensitive genes were determined by the following procedures. First, Pearson's correlation coefficient (PCC) of gene expression levels and the copy numbers of the corresponding DNA loci were calculated. Second, the column labels of both gene expression levels and the DNA loci copy numbers were permuted, and the random correlation coefficients were calculated for each gene based on the permuted matrices. Third, the cutoff value of Pearson's correlation coefficient was then determined at 0.35 so that the false-discovery rate (FDR) was <0.05, as only 56 genes had random correlation coefficients >0.35 instead of 1,114 genes based the original matrix (FDR=56/1114). The other gene expression data of newly diagnosed MM samples can be downloaded from National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) Website (www.ncbi.nlm.nih.gov/geo/); the accession number for the data sets is GSE2658 (Shaughnessy, 2007).
Genome-wide gene expression profiles and DNA copy numbers (CNs) in purified plasma cell samples obtained from 92 newly diagnosed MM patients, using the Affymetrix GeneChip and the Agilent aCGH platforms, respectively. DNA copy number-sensitive genes were determined by Pearson's correlation coefficient (PCC) of gene expression levels and the copy numbers of the corresponding DNA loci. Applying the criterion of PCC >0.35, which kept the false-discovery rate to <5%, 1,114 copy numbers-sensitive genes were identified (Table 1).
On the basis of these copy number-sensitive genes, a vCA model was developed for predicting cytogenetic abnormalities in multiple myeloma patients by means of gene expression profiling. The model focuses particularly on chromosomes 3, 5, 7, 9, 11, 13, 15, 19, and 21, as well as the 1p, 1q, and 6q segments, which are the most commonly altered chromosome regions in myeloma plasma cells.
The reference cytogenetic abnormalities (rCA) of a given chromosome region were determined by the mean values of signals of aCGH probes located in that region. The cutoff value was set at 0.45 for amplification and −0.45 for deletion, as there were only 1% greater than 0.45 on the basis of the absolute signals of probes located in chromosomes 2, 4, 10, and 12, which are the most stable chromosomes in myeloma cells. The values of rCA could be used to distinguish among amplification, deletion, and normal. Reference values for different genomical regions are shown in Table 2.
The predicted cytogenetic abnormalities (pCA) of a given chromosome region were determined by the following procedures. First, the mean expression levels of copy number-sensitive genes within the region were calculated. Then, by training the model in a gene expression profiling data set with 92 multiple myeloma samples, the cutoff value of the mean expression levels of copy number-sensitive genes for each chromosome region was set in order to obtain pCA that were most consistent with rCA in terms of the Matthews correlation coefficient, a measure of the quality of binary (two-class) classifications.
The mean prediction accuracy was 0.88 (0.59-0.99; Table 3 and Table 4) when the model was applied to the training data set. To check for overfitting in the vCA model, the model was applied to an independent data set of 23 multiple myeloma samples for which both gene expression profiling and aCGH data were available. The mean prediction accuracy was 0.89 (0.74-1.00; Table 3 and Table 5), which indicated that overfitting was negligible if present at all.
The model was validated with a FISH data set compiled from 262 independent MM samples for which both FISH records and GEP data were available. All 262 mM samples had been tested with 1p (AHCYL1) and 1q (CKS1B) probes. Of these samples, 195 had also been tested with chromosome 13 probes (D13S31 and D13S285). The cutoff value was set at 2.5 for amplification of 1q and at 1.5 for deletion of 1p and chr13, according to the distribution of the FISH signals (
In a further validation of the vCA model, a set of cytogenetic data was compiled which was generated by conventional karyotyping that included 533 independent multiple myeloma samples for which both karyotype records and GEP data were available. Applying the vCA model to the GEP data, the pCA was determined for the 533 samples. Although pCA results were matched to the karyotype reports with a mean prediction accuracy of 0.65 (0.36-0.77; Table 3 and Table 7), the consistency of the matching was lower than those of pCA vs. aCGH and pCA vs. FISH.
This prediction underperformance may be due to the fact that karyotyping can only detect the cytogenetic information for cells at metaphase, thus missing a considerable amount of information regarding the CN of DNA in a tumor cell population. If this is true, it would seem that FISH reports would also not match karyotype records well. To test this hypothesis, the FISH and karyotype data were compared for the 262 samples for which both records were available. Indeed, the prediction accuracies between FISH and karyotype records were 0.83, 0.76 and 0.60 for chr1p13, chr1q21 and chr13, respectively (Table 8), which is comparable to the prediction accuracies between pCA and karyotype (0.75, 0.72, 0.64 for chr1p13, chr1q21 and chr13, respectively; Table 7).
6. Shaughnessy J, Tian E, Sawyer J, et al. High incidence of chromosome 13 deletion in multiple myeloma detected by multiprobe interphase FISH. Blood. 2000; 96(4):1505-1511.
Any patents or publications mentioned in this specification are indicative of the levels of those skilled in the art to which the invention pertains. These patents and publications are incorporated by reference herein to the same extent as if each individual publication was incorporated by reference specifically and individually.
One skilled in the art will appreciate that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those objects, ends and advantages inherent herein. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.
It should be understood that for all numerical bounds describing some parameter in this application, such as “about,” “at least,” “less than,” and “more than,” the description also necessarily encompasses any range bounded by the recited values. Accordingly, for example, the description at least 1, 2, 3, 4, or 5 also describes, inter alia, the ranges 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, and 4-5, et cetera.
For all patents, applications, or other reference cited herein, such as non-patent literature and reference sequence information, it should be understood that it is incorporated by reference in its entirety for all purposes as well as for the proposition that is recited. Where any conflict exits between a document incorporated by reference and the present application, this application will control.
Headings used in this application are for convenience only and do not affect the interpretation of this application.
While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 61/520,793, filed on Jun. 15, 2011. The entire teachings of the above application are incorporated by reference.
This invention was made with government support under grant CA055819 awarded by the National Cancer Institute. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61520793 | Jun 2011 | US |