Cancer, a group of diseases characterized by uncontrolled growth and spread of malignant cells, is a significant cause of human mortality and morbidity world-wide, and a national economic burden in the United States.
Like all living cells, the behavior of cancer cells is controlled by the expression of a large number of different genes. Genes that are differentially expressed between cancer cells and normal cells, or between two different types of cancer cells, collectively constitute a gene expression profile that can be used to detect the presence of a cancer in an individual, classify tumor subtypes and/or predict a patient's clinical outcome. In addition, the products of these genes (e.g., mRNA, protein) provide potential targets for therapy.
The successful treatment of cancer depends, in part, on early detection and diagnosis of the cancer in an individual. Accordingly, there is a need for the identification of gene expression profiles that can be relied upon for the accurate detection and diagnosis of various types of cancers at early stages. In addition, there is a further need for a gene expression profile that includes genes that are common to many different types of cancers and, thus, can be used to screen a large patient population for the presence of a cancer. There is also a need for more efficient methods of identifying useful gene expression profiles for cancer.
The present invention encompasses, in one embodiment, a method of diagnosing whether a subject has a cancer. The method comprises detecting in a sample from the subject the level of expression of a subset of genes that are overexpressed in the cancer. According to the invention, the genes in the subset are selected from the group of genes known in the art as MELK, PLVAP, TOP2A, NEK2, CDKN3, PRC1, ESM1, PTTG1, TTK, CENPF, RDBP, CCHCR1, DEPDC1, TP5313, CCNB2, CAD, CDC2, HMMR, STMN1, HCAP-G, MDK, RAD54B, ASPM, HMGA1, SNRPC, IGF2BP3, SERPINH1, COL4A1, LARP1, LRRC1, FOXM1, CDC20, UBE2M, DNAJC6, FEN1, ASNS, CHEK1, KIF2C, AURKB, NPEPPS, KIF4A, E2F8, EZH2, ZNF193, ILF3, EHMT2, SF3A2, NPAS2, PSME3, INPPL1, BIRC5, SULT1C1, NSUN5B, HN1 and NUSAP1. Increased levels of expression of the subset of genes in the sample from the subject, relative to a control, indicate that the subject has a cancer.
In another embodiment, the invention relates to a method of providing a prognosis for a subject that has a cancer, comprising detecting the level of expression of one or more genes selected from the group consisting of PRC1, CENPF, RDBP, CCNB2 and RAD54B in a sample from the subject, and comparing the level of expression of the gene in the sample to a control. An increased level of expression of PRC1, CENPF, RDBP, CCNB2 and/or RAD54B in the sample from the subject, relative to the control, indicates a poor prognosis (e.g., an increased risk of metastasis). In a particular embodiment, the cancer is hepatocellular carcinoma, nasopharyngeal cancer or breast cancer.
In a further embodiment, the invention relates to a method of providing a prognosis for a subject that has a cancer, comprising detecting the level of expression of one or more genes selected from the group consisting of CDC2, CCHCR1, and HMGA1 in a sample from the subject, and comparing the level of expression of that gene in the sample to a control. An increased level of expression of CDC2, CCHCR1, and/or HMGA1 in the sample from the subject, relative to the control, indicates a poor prognosis (e.g., shorter survival). In a particular embodiment, the cancer is hepatocellular carcinoma, nasopharyngeal cancer or breast cancer.
The present invention also provides, in one embodiment, a kit for diagnosing whether a subject has a cancer, comprising a collection of probes capable of detecting the level of expression of at least about twenty genes selected from the group consisting of the genes known in the art as MELK, PLVAP, TOP2A, NEK2, CDKN3, PRC1, ESM1, PTTG1, TTK, CENPF, RDBP, CCHCR1, DEPDC1, TP5313, CCNB2, CAD, CDC2, HMMR, STMN1, HCAP-G, MDK, RAD54B, ASPM, HMGA1, SNRPC, IGF2BP3, SERPINH1, COL4A1, LARP1, LRRC1, FOXM1, CDC20, UBE2M, DNAJC6, FEN1, ASNS, CHEK1, KIF2C, AURKB, NPEPPS, KIF4A, E2F8, EZH2, ZNF193, ILF3, EHMT2, SF3A2, NPAS2, PSME3, INPPL1, BIRC5, SULT1C1, NSUN5B, HN1 and NUSAP1. In a particular embodiment, the probes are nucleic acid probes that hybridize to RNA (e.g., mRNA) products of these genes. In another embodiment, the probes are antibodies that bind to proteins encoded by these genes.
The invention also provides, in another embodiment, a kit for determining a prognosis (e.g., risk of metastasis) for a subject that has a cancer, comprising a probe that is capable of detecting the level of expression of one or more genes selected from the group consisting of PRC1, CENPF, RDBP, CCNB2 and RAD54B.
In yet another embodiment, the invention further provides a kit for determining a prognosis (e.g., survival) for a subject that has a cancer, comprising a probe that is capable of detecting the level of expression of one or more genes selected from the group consisting of PRC1, CDC2, CCHCR1, and HMGA1.
In another embodiment, the invention relates to a method of determining a gene expression profile for a cancer. The method comprises detecting the expression of genes in both cancerous and non-cancerous samples from the same individual (i.e., subject) and identifying genes that are differentially expressed between the cancerous and non-cancerous samples. According to the method, a gene that is differentially expressed between the cancerous sample and the non-cancerous sample is included in a gene expression profile for the cancer.
In an additional embodiment, the invention relates to a method of diagnosing whether a subject has a cancer. The method comprises detecting in a sample from the subject the level of expression of a subset of genes that are underexpressed in the cancer. According to the invention, the genes in the subset are selected from the group of genes known in the art as NAT2, CD5L, CXCL14, VIPR1, CCL14/15, FCN3, CRHBP, GPD1, KCNN2, HGFAC, FOSB, LCAT, MARCO, CYP1A2, FCN2, and DPT. Decreased levels of expression, or an absence of expression, of the subset of genes in the sample from the subject, relative to a control, indicate that the subject has a cancer.
In a further embodiment, the invention provides a kit for diagnosing whether a subject has a cancer, comprising a collection of probes capable of detecting the level of expression of at least about five genes selected from the group consisting of the genes known in the art as NAT2, CD5L, CXCL14, VIPR1, CCL14/15, FCN3, CRHBP, GPD1, KCNN2, HGFAC, FOSB, LCAT, MARCO, CYP1A2, FCN2, and DPT. In a particular embodiment, the probes are nucleic acid probes that hybridize to RNA (e.g., mRNA) products of these genes. In another embodiment, the probes are antibodies that bind to proteins encoded by these genes.
The diagnostic and prognostic methods and the kits for cancer that are provided by the present invention are based, in part, on the discovery of a universal gene expression profile, or common neoplastic signature, that is capable of distinguishing tissue samples of many different types and subtypes of cancer from corresponding normal tissue samples, and predicting clinical survival outcomes for multiple types of cancers. Unlike many gene expression profiles for cancer that have been reported previously (Whitfield M L, et al. Nature Review Cancer 6:99-106 (2006); Rhodes D R, et al. Proc. Nat. Acad. Sci. USA 101:9309-9314 (2004); see
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
As used herein, “gene expression” refers to the translation of information encoded in a gene into a gene product (e.g., RNA, protein). Expressed genes include genes that are transcribed into RNA (e.g., mRNA) that is subsequently translated into protein, as well as genes that are transcribed into non-coding functional RNA molecules that are not translated into protein (e.g., transfer RNA (tRNA), ribosomal RNA (rRNA), microRNA, ribozymes).
“Level of expression,” “expression level” or “expression intensity” refers to the level (e.g., amount) of one or more products (e.g., mRNA, protein) encoded by a given gene in a sample or reference standard.
As used herein, “differentially expressed” or “differential expression” refers to any statistically significant difference (p<0.05) in the level of expression of a gene between two samples (e.g., two biological samples), or between a sample and a reference standard. Whether a difference in expression between two samples is statistically significant can be determined using an appropriate t-test (e.g., one-sample t-test, two-sample t-test, Welch's t-test) or other statistical test known to those of skill in the art.
As used herein, the phrase “subset of genes overexpressed in cancer” refers to a combination of two or more genes, each of which display an elevated or increased level of expression in a cancer sample relative to a suitable control (e.g., a non-cancerous tissue or cell sample, a reference standard), wherein the elevation or increase in the level of gene expression is statistically-significant (p<0.05). Whether an increase in the expression of a gene in a cancer sample relative to a control is statistically significant can be determined using an appropriate t-test (e.g., one-sample t-test, two-sample t-test, Welch's t-test) or other statistical test known to those of skill in the art. Genes that are overexpressed in a cancer can be, for example, genes that are known, or have been previously determined, to be overexpressed in a cancer.
As used herein, the phrase “subset of genes underexpressed in cancer” refers to a combination of two or more genes, each of which display a reduced or decreased level of expression in a cancer sample relative to a suitable control (e.g., a non-cancerous tissue or cell sample, a reference standard), wherein the reduction or decrease in the level of gene expression is statistically-significant (p<0.05). In some embodiments, the reduced or decreased level of gene expression can be a complete absence of gene expression, or an expression level of zero. Whether a decrease in the expression of a gene in a cancer sample relative to a control is statistically significant can be determined using an appropriate t-test (e.g., one-sample t-test, two-sample t-test, Welch's t-test) or other statistical test known to those of skill in the art. Genes that are underexpressed in a cancer can be, for example, genes that are known, or have been previously determined, to be underexpressed in a cancer.
A “gene expression profile” or “expression profile” refers to a set of genes which have expression levels that are associated with a particular biological activity (e.g., cell proliferation, cell cycle regulation, metastasis), cell type, disease state (e.g., cancer), state of cell differentiation or condition.
A “common neoplastic signature” or “CNS” refers to a gene expression profile that is associated with (e.g., is diagnostic of) many different common cancers.
“Tumor-specific genes” as used herein are genes which have expression levels that are characterized as “present” in a cancer (e.g., a hepatocellular carcinoma) tissue sample, and “absent” or “marginal” in an adjacent non-tumor tissue (e.g., normal liver tissue) sample, by both Affymetrix Microarray Analysis Suite (MAS) 5.0 and DNA Chip Analyzer (dChip) software applications.
“Non-tumor tissue-specific genes” as used herein are genes which have expression levels that are characterized as “absent” or “marginal” in a cancer (e.g., a hepatocellular carcinoma) tissue sample, and “present” in an adjacent non-tumor tissue (e.g., normal liver tissue) sample, by both MAS 5.0 and dChip software applications.
The term “stringency,” “stringency filter,” or “stringency level” as used herein refers to a number that directly corresponds to the number, out of a total of 18, of paired HCC and adjacent non-tumorous liver tissue samples that display significant differential expression of a particular gene or group of genes by microarray expression profiling analysis, as determined by both Affymetrix Microarray Analysis Suite (MAS) 5.0 and DNA Chip Analyzer (dChip) software applications using “present” vs “absent” or “marginal” status. Thus, the values for a “stringency,” “stringency filter,” or “stringency level” used herein range from a high stringency of eighteen to a low stringency of one.
The term “probe set” refers to probes on an array (e.g., a microarray) that are complementary to the same target gene or gene product. A probe set may consist of one or more probes.
As used herein, the term “sample” refers to a biological sample (e.g., a tissue sample, a cell sample, a fluid sample) that expresses genes that display differential levels of expression when cancer cells are present in the sample versus when cancer cells are absent from the sample, for a given type of cancer.
As used herein, “adjacent samples,” “adjacent tissue samples,” “paired samples” or “paired tissue samples” refer to two or more biological samples that are present in, or isolated from, the same tissue or organ of a subject.
The term “oligonucleotide” as used herein refers to a nucleic acid molecule (e.g., RNA, DNA) that is about 5 to about 150 nucleotides in length. The oligonucleotide may be a naturally occurring oligonucleotide or a synthetic oligonucleotide. Oligonucleotides may be prepared by the phosphoramidite method (Beaucage and Carruthers, Tetrahedron Lett. 22:1859-62, 1981), or by the triester method (Matteucci, et al., J. Am. Chem. Soc. 103:3185, 1981), or by other chemical methods known in the art.
As used herein, “probe oligonucleotide” or “probe oligodeoxynucleotide” refers to an oligonucleotide that is capable of hybridizing to a target oligonucleotide.
“Target oligonucleotide” or “target oligodeoxynucleotide” refers to a molecule to be detected (e.g., via hybridization).
“Distant metastasis” refers to cancer cells that have spread from the original (i.e., primary) tumor to distant organs or distant lymph nodes.
“Detectable label” as used herein refers to any moiety that is capable of being specifically detected, either directly or indirectly, and therefore, can be used to distinguish a molecule that comprises the detectable label from a molecule that does not comprise the detectable label.
The phrase “specifically hybridizes” refers to the specific association of two complementary nucleotide sequences (e.g., DNA, RNA or a combination thereof) in a duplex under stringent conditions. The association of two nucleic acid molecules in a duplex occurs as a result of hydrogen bonding between complementary base pairs.
“Stringent conditions” or “stringency conditions” refer to a set of conditions under which two complementary nucleic acid molecules can hybridize. However, stringent conditions do not permit hybridization of two nucleic acid molecules that are not complementary (two nucleic acid molecules that have less than 70% sequence complementarity).
As used herein, “low stringency conditions” include, for example, hybridization in 6× sodium chloride/sodium citrate (SSC) at about 45° C., followed by two washes in 0.2×SSC, 0.1% SDS at least at 50° C. (the temperature of the washes can be increased to 55° C. for low stringency conditions).
“Medium stringency conditions” include, for example, hybridization in 6×SSC at about 45° C., followed by one or more washes in 0.2×SSC, 0.1% SDS at 60° C.
As used herein, “high stringency conditions” include, for example, hybridization in 6×SSC at about 45° C., followed by one or more washes in 0.2×SSC, 0.1% SDS at 65° C.;
“Very high stringency conditions” include, but are not limited to, hybridization in 0.5M sodium phosphate, 7% SDS at 65° C., followed by one or more washes at 0.2×SSC, 1% SDS at 65° C.
As used herein, the term “polypeptide” refers to a polymer of amino acids of any length and encompasses proteins, peptides, and oligopeptides.
As used herein, the term “antibody” refers to a polypeptide having affinity for a target, antigen, or epitope, and includes both naturally-occurring and engineered antibodies. The term “antibody” encompasses polyclonal, monoclonal, human, chimeric, humanized, primatized, veneered, and single chain antibodies, as well as fragments of antibodies (e.g., Fv, Fc, Fd, Fab, Fab′, F(ab′), scFv, scFab, dAb). (See e.g., Harlow et al., Antibodies A Laboratory Manual, Cold Spring Harbor Laboratory, 1988).
As defined herein, the term “antigen binding fragment” refers to a portion of an antibody that contains one or more CDRs and has affinity for an antigenic determinant by itself. Non-limiting examples include Fab fragments, F(ab)′2 fragments, heavy-light chain dimers, and single chain structures, such as a complete light chain or a complete heavy chain.
As used herein, “specifically binds” refers to a probe (e.g., an antibody, an aptamer) that binds to a target protein (e.g., the protein product of a CNS gene) with an affinity (e.g., a binding affinity) that is at least about 5 fold, preferably at least about 10 fold, greater than the affinity with which the probe binds a non-target protein.
“Target protein” refers to a protein to be detected (e.g., using a probe comprising a detectable label).
As used herein, a “subject” refers to a mammal. The term “subject” therefore, includes, for example, primates (e.g., humans), cows, sheep, goats, horses, dogs, cats, rabbits, guinea pigs, rats, mice or other bovine, ovine, equine, canine, feline, rodent or murine species. In a preferred embodiment, the subject is a human. Examples of suitable subjects include, but are not limited to, human patients that have, or are at risk for developing, a cancer (e.g., HCC).
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization techniques and biochemistry). Standard techniques are used for molecular, genetic and biochemical methods (see generally, Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. and Ausubel et al., Short Protocols in Molecular Biology (1999) 4th Ed, John Wiley & Sons, Inc. which are incorporated herein by reference) and chemical methods.
As described herein, a gene expression profile that includes genes that are differentially expressed between paired hepatocellular carcinoma (HCC) and normal liver tissues can serve as a common neoplastic signature (“CNS”) that is capable of differentiating several different types of cancers from corresponding normal tissues. As described herein, a common neoplastic signature of 55 genes was able to distinguish tissue samples representing six major types of cancers, and 19 out of 20 subtypes of cancers, from corresponding normal tissue samples. In addition, a subset of the genes in the CNS were associated with poor prognoses, including shorter survival or increased risk of distant metastasis, for three different types of cancer (HCC, nasopharyngeal cancer and breast cancer).
The present invention encompasses, in one embodiment, a method of diagnosing whether a subject has a cancer. The method comprises detecting in a sample from the subject the level of expression of a subset of genes that are overexpressed in the cancer (e.g., tumor). Increased levels of expression of the genes of the subset in the sample from the subject, relative to a control, indicate that the subject has cancer.
The subset of genes that are overexpressed in the cancer can include any combination of two or more genes from a common neoplastic signature that includes the following 55 genes: MELK, PLVAP, TOP2A, NEK2, CDKN3, PRC1, ESM1, PTTG1, TTK, CENPF, RDBP, CCHCR1, DEPDC1, TP5313, CCNB2, CAD, CDC2, HMMR, STMN1, HCAP-G, MDK, RAD54B, ASPM, HMGA1, SNRPC, IGF2BP3, SERPINH1, COL4A1, LARP1, LRRC1, FOXM1, CDC20, UBE2M, DNAJC6, FEN1, ASNS, CHEK1, KIF2C, AURKB, NPEPPS, KIF4A, E2F8, EZH2, ZNF193, ILF3, EHMT2, SF3A2, NPAS2, PSME3, INPPL1, BIRC5, SULT1C1, NSUN5B, HN1 and NUSAP1. The gene known in the art as HCAP-G is also known in the art as NCAPG, and these two gene designations are used interchangeably herein.
Different subsets of genes from the CNS are likely to be overexpressed in different cancers (e.g., hepatocellular carcinoma, nasopharyngeal cancer, breast cancer, lung cancer, renal cell carcinoma, colon cancer). Therefore, the particular genes and/or number of genes in the CNS that are overexpressed in a given type or subtype of cancer may differ from the genes and/or number of genes from the CNS that are overexpressed in another type or subtype of cancer. The subset of genes that are overexpressed in a cancer can include 2 or more genes of the CNS, up to, and including all 55 genes of the CNS described herein. In one embodiment, the subset of genes that are overexpressed in a cancer includes all 55 genes of the common neoplastic signature. In another embodiment, the subset of genes that are overexpressed in a cancer includes about 20 genes of the CNS. The nucleotide sequences of the genes of the common neoplastic signature and the nucleotide and amino acid sequences of their RNA and protein products, respectively, have been reported (see Table 1) and can be readily ascertained by those of skill in the art.
The methods described herein can be used to diagnose many different types of cancers. In a particular embodiment, the methods of the invention can be used to diagnose a cancer selected from the group consisting of breast cancer, colon cancer, endometrial cancer, renal cell carcinoma, liver cancer, lung cancer, ovarian cancer, pancreatic cancer, prostate cancer, rectal cancer, skin cancer, stomach cancer, and thyroid cancer. Various cancer subtypes can also be diagnosed using the methods of the inventions. Such cancer subtypes include, but are not limited to the cancer subtypes listed in
In another embodiment, the invention relates to a method of providing a prognosis for a subject that has a cancer, comprising detecting the level of expression of one or more genes of the CNS. According to the invention, expression (e.g., overexpression) of certain genes in the CNS is indicative of a poor prognosis. The prognosis can be, but is not limited to, a prognosis for patient survival, risk of metastases, or risk of relapse after treatment. In a particular embodiment, the prognosis is for a patient that has hepatocellular carcinoma, nasopharyngeal cancer or breast cancer.
As described herein, a strong association exists between expression (e.g., overexpression) of certain genes in the CNS in cancer samples and a poor patient prognosis (e.g., shorter survival, increased risk of metastases (see, e.g., Examples 4-7)). Specifcally, expression (e.g., elevated expression) of PRC1, CENPF, RDBP, CCNB2 and/or RAD54B in samples from subjects that have hepatocellular carcinoma, nasopharyngeal cancer or breast cancer, is associated with an increased risk of distant metastasis. In addition, expression (e.g., elevated expression) of CDC2, CCHCR1, and/or HMGA1 in samples from subjects that have hepatocellular carcinoma, nasopharyngeal cancer or breast cancer, is associated with a shorter survival.
For the diagnostic and prognostic methods of the invention, gene expression can be assessed in a suitable sample from a subject. A suitable sample can be a tissue sample, a biological fluid sample, a cell (e.g., a tumor cell) sample, and the like. Any means of sampling from a subject, for example, by blood draw, spinal tap, tissue smear or scrape, or tissue biopsy can be used to obtain a sample. Thus, the sample can be a biopsy specimen (e.g., tumor, polyp, mass (solid, cell)), aspirate, smear or blood sample. In a preferred embodiment, the sample is a blood sample (e.g., a blood serum sample). The sample can be a tissue from an organ that has a tumor (e.g., cancerous growth) and/or tumor cells, or is suspected of having a tumor and/or tumor cells. For example, a tumor biopsy can be obtained in an open biopsy, a procedure in which an entire (excisional biopsy) or partial (incisional biopsy) mass is removed from a target area. Alternatively, a tumor sample can be obtained through a percutaneous biopsy, a procedure performed with a needle-like instrument through a small incision or puncture (with or without the aid of an imaging device) to obtain individual cells or clusters of cells (e.g., a fine needle aspiration (FNA)) or a core or fragment of tissues (core biopsy). The biopsy samples can be examined cytologically (e.g., smear), histologically (e.g., frozen or paraffin section) or using any other suitable method (e.g., molecular diagnostic methods). A tumor sample can also be obtained by in vitro harvest of cultured human cells derived from an individual's tissue. Tumor samples can, if desired, be stored before analysis by suitable storage means that preserve a sample's protein and/or nucleic acid in an analyzable condition, such as quick freezing, or a controlled freezing regime. If desired, freezing can be performed in the presence of a cryoprotectant, for example, dimethyl sulfoxide (DMSO), glycerol, or propanediol-sucrose. Tumor samples can be pooled, as appropriate, before or after storage for purposes of analysis.
In one embodiment, a cancer can be diagnosed, or a prognosis for a subject can be provided, by detecting expression of a subset of genes from the CNS, or their gene products (e.g., mRNA, protein), in a sample from a patient. Thus, the method does not require that expression in the sample from the patient be compared to a control. The presence or absence of gene expression can be ascertained by the methods described herein or other suitable assays known to those of skill in the art.
A difference (e.g., an increase, a decrease) in gene expression can be determined by comparison of the level of expression of the gene in a sample from a subject to that of a suitable control. Suitable controls include, for instance, a non-neoplastic tissue sample (e.g., a non-neoplastic tissue sample from the same subject from which the cancer sample has been obtained), a sample of non-cancerous cells, non-metastatic cancer cells, non-malignant (benign) cells or the like, or a suitable known or determined reference standard. The reference standard can be a typical, normal or normalized range of levels, or a particular level, of expression of a protein or RNA (e.g., an expression standard). The standards can comprise, for example, a zero gene expression level, the gene expression level in a standard cell line, or the average level of gene expression previously obtained for a population of normal human controls. Thus, the method does not require that expression of the gene/gene product be assessed in, or compared to, a control sample.
Suitable assays that can be used to assess the level of expression of a gene, or the level (e.g., amount) of a gene product (e.g., mRNA, protein), in a sample (e.g., biological sample) from a subject are known to those of skill in the art. For example, the level of an RNA (e.g., mRNA) gene product in a sample can be measured using any technique that is suitable for detecting RNA expression levels in a biological sample. Several suitable techniques for determining RNA expression levels in cells from a biological sample (e.g., Northern blot analysis, RT-PCR, in situ hybridization) are well known to those of skill in the art. In a particular embodiment, the level of at least one gene product is detected using Northern blot analysis. For example, total cellular RNA can be purified from cells by homogenization in the presence of nucleic acid extraction buffer, followed by centrifugation. Nucleic acids are precipitated, and DNA is removed by treatment with DNase and precipitation. The RNA molecules are then separated by gel electrophoresis on agarose gels according to standard techniques, and transferred to nitrocellulose filters. The RNA is then immobilized on the filters by heating. Detection and quantification of specific RNA is accomplished using appropriately labeled DNA or RNA probes complementary to the RNA in question. See, for example, Molecular Cloning: A Laboratory Manual, J. Sambrook et al., eds., 2nd edition, Cold Spring Harbor Laboratory Press, 1989, Chapter 7, the entire disclosure of which is incorporated by reference.
Suitable probes for Northern blot hybridization include nucleic acid probes that are complementary to the nucleotide sequences of the RNA (e.g., mRNA) and/or cDNA sequences of the genes of the CNS. Methods for preparation of labeled DNA and RNA probes, and the conditions for hybridization thereof to target nucleotide sequences, are described in Molecular Cloning: A Laboratory Manual, J. Sambrook et al., eds., 2nd edition, Cold Spring Harbor Laboratory Press, 1989, Chapters 10 and 11, the disclosures of which are herein incorporated by reference.
For example, the nucleic acid probe can be labeled with, e.g., a radionuclide such as 3H, 32P, 33P, 14C, or 35S; a heavy metal; or a ligand capable of functioning as a specific binding pair member for a labeled ligand (e.g., biotin, avidin or an antibody), a fluorescent molecule, a chemiluminescent molecule, an enzyme or the like.
Probes can be labeled to high specific activity by either the nick translation method of Rigby et al. (1977), J. Mol. Biol. 113:237-251 or by the random priming method of Fienberg et al. (1983), Anal. Biochem. 132:6-13, the entire disclosures of which are herein incorporated by reference. The latter is the method of choice for synthesizing 32P-labeled probes of high specific activity from single-stranded DNA or from RNA templates. For example, by replacing preexisting nucleotides with highly radioactive nucleotides according to the nick translation method, it is possible to prepare 32P-labeled nucleic acid probes with a specific activity well in excess of 108 cpm/microgram. Autoradiographic detection of hybridization can then be performed by exposing hybridized filters to photographic film. Densitometric scanning of the photographic films exposed by the hybridized filters provides an accurate measurement of gene transcript levels. Using another approach, gene transcript levels can be quantified by computerized imaging systems, such the Molecular Dynamics 400-B 2D Phosphorimager available from Amersham Biosciences, Piscataway, N.J.
Where radionuclide labeling of DNA or RNA probes is not practical, the random-primer method can be used to incorporate an analogue, for example, the dTTP analogue 5-(N—(N-biotinyl-epsilon-aminocaproyl)-3-aminoallyl)deoxyuridine triphosphate, into the probe molecule. The biotinylated probe oligonucleotide can be detected by reaction with biotin-binding proteins, such as avidin, streptavidin, and antibodies (e.g., anti-biotin antibodies) coupled to fluorescent dyes or enzymes that produce color reactions.
In addition to Northern and other RNA hybridization techniques, determining the levels of RNA transcripts can be accomplished using the technique of in situ hybridization. This technique requires fewer cells than the Northern blotting technique, and involves depositing whole cells onto a microscope cover slip and probing the nucleic acid content of the cell with a solution containing radioactive or otherwise labeled nucleic acid (e.g., cDNA or RNA) probes. This technique is particularly well-suited for analyzing tissue biopsy samples from subjects. The practice of the in situ hybridization technique is described in more detail in U.S. Pat. No. 5,427,916, the entire disclosure of which is incorporated herein by reference. Suitable probes for in situ hybridization of a given gene product can be produced, for example, from the nucleic acid sequences of the RNA products of the CNS genes described herein.
Levels of a nucleic acid (e.g., mRNA transcript) in a sample from a subject can also be assessed using any standard nucleic acid amplification technique, such as, for example, polymerase chain reaction (PCR) (e.g., direct PCR, quantitative real time PCR (qRT-PCR), reverse transcriptase PCR (RT-PCR)), ligase chain reaction, self sustained sequence replication, transcriptional amplification system, Q-Beta Replicase, or the like, and visualized, for example, by labeling of the nucleic acid during amplification, exposure to intercalating compounds/dyes, probes, etc. In a particular embodiment, the relative number of gene transcripts in a sample is determined by reverse transcription of gene transcripts (e.g., mRNA), followed by amplification of the reverse-transcribed products by polymerase chain reaction (e.g., RT-PCR). The levels of gene transcripts can be quantified in comparison with an internal standard, for example, the level of mRNA from a “housekeeping” gene present in the same sample. A suitable “housekeeping” gene for use as an internal standard includes, e.g., myosin or glyceraldehyde-3-phosphate dehydrogenase (G3PDH). The methods for quantitative RT-PCR and variations thereof are within the skill in the art.
In some instances, it may be desirable to simultaneously determine the expression level of several different gene products in a sample. For example, it may be desirable to determine the expression level of the transcripts of all genes in the
CNS described herein in a sample from a subject. Assessing cancer-specific expression levels for many genes individually is time consuming and requires a large amount of total RNA (at least about 20 μg for each Northern blot) and autoradiographic techniques that require radioactive isotopes. To overcome these limitations, an oligolibrary, in microchip format (e.g., a gene chip, a microarray), may be constructed containing a set of probe oligodeoxynucleotides that are specific for a set of genes. Using such a microarray, the expression level of multiple RNA transcripts in a biological sample can be determined by reverse transcribing the RNAs to generate a set of target oligodeoxynucleotides, and hybridizing them to probe oligodeoxynucleotides on the microarray to generate a hybridization, or expression, profile. The hybridization profile of the test sample can then be compared to that of a control sample to determine which RNAs have an altered expression level in a cancer sample.
The microarray may be fabricated using techniques known in the art. For example, probe oligonucleotides of an appropriate length can be 5′-amine modified at position C6 and printed using commercially available microarray systems, e.g., the GeneMachine OmniGrid™ 100 Microarrayer and Amersham CodeLink™ activated slides. Labeled cDNA oligomers corresponding to the target RNAs are prepared by reverse transcribing the target RNA with labeled primer. Following first strand synthesis, the RNA/DNA hybrids are denatured to degrade the RNA templates. The labeled target cDNAs thus prepared are then hybridized to the microarray chip under hybridizing conditions, e.g. 6×SSPE/30% formamide at 25° C. for 18 hours, followed by washing in 0.75× TNT at 37° C. for 40 minutes. At positions on the array where the immobilized probe DNA recognizes a complementary target cDNA in the sample, hybridization occurs. The labeled target cDNA marks the exact position on the array where binding occurs, allowing automatic detection and quantification. The output consists of a list of hybridization events, indicating the relative abundance of specific cDNA sequences, and therefore the relative abundance of the corresponding gene products, in the patient sample. According to one embodiment, the labeled cDNA oligomer is a biotin-labeled cDNA, prepared from a biotin-labeled primer. The microarray is then processed by direct detection of the biotin-containing transcripts using, e.g., Streptavidin-Alexa647 conjugate, and scanned utilizing conventional scanning methods. Images intensities of each spot on the array are proportional to the abundance of the corresponding gene product in the patient sample.
An “expression profile” or “hybridization profile” of a particular sample is essentially a fingerprint of the state of the sample; while two states may have any particular genes similarly expressed, the evaluation of a number of genes simultaneously allows the generation of a gene expression profile that is unique to the state of the cell. That is, normal tissue may be distinguished from cancer tissue, and within cancer tissue, different prognosis states (good or poor long term survival prospects, for example) may be determined. By comparing expression profiles of cancer tissue in different states, information regarding which genes are important (including both up- and down-regulation of genes) in each of these states is obtained. The identification of sequences that are differentially expressed in cancer tissue versus normal tissue, as well as differential expression resulting in different prognostic outcomes, allows the use of this information in a number of ways. For example, a particular treatment regime may be evaluated (e.g., to determine whether a chemotherapeutic drug act to improve the long-term prognosis in a particular patient). Similarly, diagnosis may be done or confirmed by comparing patient samples with the known expression profiles. Furthermore, these gene expression profiles (or individual genes) allow screening of drug candidates that suppress the breast cancer expression profile or convert a poor prognosis profile to a better prognosis profile.
In a particular embodiment, total RNA from a sample from a subject that has, or is suspected of having or being at risk for developing, a cancer is quantitatively reverse transcribed to provide a set of labeled target oligodeoxynucleotides complementary to the RNA in the sample. The target oligodeoxynucleotides are then hybridized to a microarray comprising gene-specific probe oligonucleotides to provide a hybridization profile for the sample. The result is a hybridization profile for the sample representing the expression pattern of genes in the sample. The hybridization profile comprises the signal from the binding of the target oligodeoxynucleotides from the sample to the gene-specific probe oligonucleotides in the microarray. The profile may be recorded as the presence or absence of binding (signal vs. zero signal). More preferably, the profile recorded includes the intensity of the signal from each hybridization. The profile is compared to the hybridization profile generated from a normal, i.e., noncancerous, control sample. An alteration (e.g., increase) in the signal is indicative of the presence of the cancer in the subject.
Gene expression on an array or gene chip can be assessed using an appropriate algorithm (e.g., statistical algorithm). Suitable software applications for assessing gene expression levels using a microarray or gene chip are known in the art. In a particular embodiment, gene expression on a microarray is assessed using Affymetrix Microarray Analysis Suite (MAS) 5.0 software and/or DNA Chip Analyzer (dChip) software, for example, as described herein in Example 1.
In a particular embodiment, fragments of RNA transcripts for any of the 55 tumor-specific genes described herein (see
Other techniques for measuring gene expression in a sample are also within the skill in the art, and include various techniques for measuring rates of RNA transcription and degradation.
The level of expression of a gene of the CNS can also be determined by assessing the level of a protein(s) encoded by the gene in a sample from a subject. Methods for detecting a protein product of a CNS gene include, for example, immunological and immunochemical methods, such as flow cytometry (e.g., FACS analysis), enzyme-linked immunosorbent assays (ELISA), chemiluminescence assays, radioimmunoassay, immunoblot (e.g., Western blot), immunohistochemistry (IHC), and mass spectrometry. For instance, antibodies to a protein product of a CNS gene can be used to determine the presence and/or expression level of the protein in a sample either directly or indirectly e.g., using immunohistochemistry (IHC). For example, paraffin sections can be taken from a biopsy, fixed to a slide and combined with one or more antibodies by suitable methods.
A difference (e.g., an increase, a decrease) in the level of expression of a gene between two samples, or between a sample and a reference standard, can be determined using an appropriate algorithm, several of which are know to those of skill in the art. For example, the identification of genes displaying differential expression (e.g., significant differential expression) between cancer (e.g., HCC) and adjacent non-tumor tissues, can be determined using the algorithm described herein in Example 1 and
A statistically significant difference (e.g., an increase, a decrease) in the level of expression of a gene between two samples, or between a sample and a reference standard, can be determined using an appropriate statistical test(s), several of which are known to those of skill in the art. In a particular embodiment, a t-test (e.g., a one-sample t-test, a two-sample t-test) is employed to determine whether a difference in gene expression is statistically significant. For example, a statistically significant difference in the level of expression of a gene between two samples can be determined using a two-sample t-test (e.g., a two-sample Welch's t-test). A statistically significant difference in the level of expression of a gene between a sample and a reference standard can be determined using a one-sample t-test. Other useful statistical analyses for assessing differences in gene expression include a Chi-square test, Fisher's exact test, and log-rank and Wilcoxon tests (see Examples 1-7).
The present invention also encompasses kits for diagnosing whether a subject has a cancer. Diagnostic kits of the invention include a collection of probes capable of detecting the level of expression of multiple genes of the CNS described herein (i.e., MELK, PLVAP, TOP2A, NEK2, CDKN3, PRC1, ESM1, PTTG1, TTK, CENPF, RDBP, CCHCR1, DEPDC1, TP5313, CCNB2, CAD, CDC2, HMMR, STMN1, HCAP-G, MDK, RAD54B, ASPM, HMGA1, SNRPC, IGF2BP3, SERPINH1, COL4A1, LARD 1, LRRC1, FOXM1, CDC20, UBE2M, DNAJC6, FEN1, ASNS, CHEK1, KIF2C, AURKB, NPEPPS, KIF4A, E2F8, EZH2, ZNF193, ILF3, EHMT2, SF3A2, NPAS2, PSME3, INPPL1, BIRC5, SULT1C1, NSUN5B, HN1, NUSAP1). For example, the kits can include a collection of probes capable of detecting the level of expression of at least about two genes of the CNS, for example about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 genes of the common neoplastic signature. In one embodiment, the kit encompasses a collection of probes capable of detecting the level of expression of all 55 genes in the common neoplastic signature. In a particular embodiment, the kits encompass a collection of probes capable of detecting the level of expression of at least about ten (10) genes, preferably about fifteen (15) genes, and more preferably, about twenty (20) genes of the CNS described herein.
The invention also provides kits for determining the prognosis (e.g., risk of metastasis, survival) of a subject that has a cancer. In one embodiment, the kits comprise a probe that is capable of detecting the level of expression of at least one gene selected from the group consisting of PRC1, CENPF, RDBP, CCNB2 and RAD54B, or any combination thereof. In another embodiment, the invention relates to kits for determining the prognosis of a subject that has a cancer, comprising a probe that is capable of detecting the level of expression of at least one gene selected from the group consisting of PRC1, CDC2, CCHCR1 and HMGA1, or any combination thereof.
The diagnostic and prognostic kits of the invention include probes (e.g., nucleic acid probes, antibodies) for detecting the expression of CNS genes in a sample (e.g., a biological sample from a mammalian subject).
Accordingly, in one embodiment, the kit comprises nucleic acid probes (e.g., oligonucleotide probes, polynucleotide probes) that specifically hybridize to an RNA transcript (e.g., mRNA, hnRNA) of a CNS gene. Such probes are capable of binding (i.e., hybridizing) to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing via hydrogen bond formation. As used herein, a nucleic acid probe may include natural (i.e., A, G, U, C or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in the nucleic acid probes may be joined by a linkage other than a phosphodiester bond, so long as the linkage does not interfere with hybridization. Thus, probes may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.
Guidance for performing hybridization reactions can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6, the relevant teachings of which are incorporated herein by reference in their entirety. Suitable hybridization conditions resulting in specific hybridization vary depending on the length of the region of homology, the GC content of the region, and the melting temperature (“Tm”) of the hybrid. Thus, hybridization conditions may vary in salt content, acidity, and temperature of the hybridization solution and the washes. Complementary hybridization between a probe nucleic acid and a target nucleic acid involving minor mismatches can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid. In a particular embodiment, the nucleic acid probes in the kits of the invention are capable of hybridizing to RNA (e.g., mRNA) transcripts of CNS genes under conditions of high stringency.
In another embodiment, the kits include pairs of oligonucleotide primers that are capable of specifically hybridizing to an RNA transcript of a CNS gene, or a corresponding cDNA. Such primers can be used in any standard nucleic acid amplification procedure (e.g., polymerase chain reaction (PCR), for example, RT-PCR, quantitative real time PCR) to determine the level of the RNA transcript in the sample. As used herein, the term “primer” refers to an oligonucleotide, which is complementary to the template polynucleotide sequence and is capable of acting as a point for the initiation of synthesis of a primer extension product. In one embodiment, the primer is complementary to the sense strand of a polynucleotide sequence and acts as a point of initiation for synthesis of a forward extension product. In another embodiment, the primer is complementary to the antisense strand of a polynucleotide sequence and acts as a point of initiation for synthesis of a reverse extension product. The primer may occur naturally, as in a purified restriction digest, or be produced synthetically. The appropriate length of a primer depends on the intended use of the primer, but typically ranges from about 5 to about 200; from about 5 to about 100; from about 5 to about 75; from about 5 to about 50; from about 10 to about 35; from about 18 to about 22 nucleotides. A primer need not reflect the exact sequence of the template but must be sufficiently complementary to hybridize with a template for primer elongation to occur, i.e., the primer is sufficiently complementary to the template polynucleotide sequence such that the primer will anneal to the template under conditions that permit primer extension.
In another embodiment, the kits of the invention include antibodies that specifically bind a protein encoded by a gene of the CNS described herein. Such antibody probes can be polyclonal, monoclonal, human, chimeric, humanized, primatized, veneered, or single chain antibodies, as well as fragments of antibodies (e.g., Fv, Fc, Fd, Fab, Fab′, F(ab′), scFv, scFab, dAb), among others. (See e.g., Harlow et al., Antibodies A Laboratory Manual, Cold Spring Harbor Laboratory, 1988). Antibodies that specifically bind to protein encoded by a gene of the CNS described herein can be produced, constructed, engineered and/or isolated by conventional methods or other suitable techniques (see e.g., Kohler et al., Nature, 256: 495-497 (1975) and Eur. J. Immunol. 6: 511-519 (1976); Milstein et al., Nature 266: 550-552 (1977); Koprowski et al., U.S. Pat. No. 4,172,124; Harlow, E. and D. Lane, 1988, Antibodies: A Laboratory Manual, (Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y.); Current Protocols In Molecular Biology, Vol. 2 (Supplement 27, Summer '94), Ausubel, F.M. et al., Eds., (John Wiley & Sons: New York, N.Y.), Chapter 11, (1991); Chuntharapai et al., J. Immunol., 152:1783-1789 (1994); Chuntharapai et al. U.S. Pat. No. 5,440,021)). Other suitable methods of producing or isolating antibodies of the requisite specificity can be used, including, for example, methods which select a recombinant antibody or antibody-binding fragment (e.g., dAbs) from a library (e.g., a phage display library), or which rely upon immunization of transgenic animals (e.g., mice). Transgenic animals capable of producing a repertoire of human antibodies are well-known in the art (e.g., Xenomouse® (Abgenix, Fremont, Calif.)) and can be produced using suitable methods (see e.g., Jakobovits et al., Proc. Natl. Acad. Sci. USA, 90: 2551-2555 (1993); Jakobovits et al., Nature, 362: 255-258 (1993); Lonberg et al., U.S. Pat. No. 5,545,806; Surani et al., U.S. Pat. No. 5,545,807; Lonberg et al., WO 97/13852).
Once produced, an antibody specific for a protein encoded by a CNS gene described herein can be readily identified using methods for screening and isolating specific antibodies that are well known in the art. See, for example, Paul (ed.), Fundamental Immunology, Raven Press, 1993; Getzoff et al., Adv. in Immunol. 43:1-98, 1988; Goding (ed.), Monoclonal Antibodies: Principles and Practice, Academic Press Ltd., 1996; Benjamin et al., Ann. Rev. Immunol. 2:67-101, 1984. A variety of assays can be utilized to detect antibodies that specifically bind to proteins encoded by the CNS genes described herein. Exemplary assays are described in detail in Antibodies: A Laboratory Manual, Harlow and Lane (Eds.), Cold Spring Harbor Laboratory Press, 1988. Representative examples of such assays include: concurrent immunoelectrophoresis, radioimmunoassay, radioimmuno-precipitation, enzyme-linked immunosorbent assay (ELISA), dot blot or Western blot assays, inhibition or competition assays, and sandwich assays.
The probes in the diagnostic and prognostic kits of the invention can be conjugated to one or more labels (e.g., detectable labels). Numerous suitable labels for diagnostic probes are known in the art and include any of the labels described herein. Suitable detectable labels for use in the methods of the present invention include, but are not limited to, chromophores, fluorophores, haptens, radionuclides (e.g., 3H, 125I, 131I, 32P, 33P, 35S, 14C, 51Cr, 36Cl, 57Co, 58Co, 59Fe and 75Se), fluorescence quenchers, enzymes, enzyme substrates, affinity tags (e.g., biotin, avidin, streptavidin, etc.), mass tags, electrophoretic tags and epitope tags that are recognized by an antibody (e.g., digoxigenin (DIG), hemagglutinin (HA), myc, FLAG). In certain embodiments, the label is present on the 5 carbon position of a pyrimidine base or on the 3 carbon deaza position of a purine base of a nucleic acid probe.
In a particular embodiment, the label that is conjugated to the probes is a fluorophore. Suitable fluorophores can be provided as fluorescent dyes, including, but not limited to Alexa Fluor dyes (Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 633, Alexa Fluor 660 and Alexa Fluor 680), AMCA, AMCA-S, BODIPY dyes (BODIPY FL, BODIPY R6G, BODIPY TMR, BODIPY TR, BODIPY 530/550, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665), CAL dyes, Carboxyrhodamine 6G, carboxy-X-rhodamine (ROX), Cascade Blue, Cascade Yellow, Cyanine dyes (Cy3, Cy5, Cy3.5, Cy5.5), Dansyl, Dapoxyl, Dialkylaminocoumarin, 4′,5′-Dichloro-2′,7′-dimethoxy-fluorescein, DM-NERF, Eosin, Erythrosin, Fluorescein, Carboxy-fluorescein (FAM), Hydroxycoumarin, IRDyes (IRD40, IRD 700, IRD 800), JOE, Lissamine rhodamine B, Marina Blue, Methoxycoumarin, Naphthofluorescein, Oregon Green 488, Oregon Green 500, Oregon Green 514, Oyster dyes, Pacific Blue, PyMPO, Pyrene, Rhodamine 6G, Rhodamine Green, Rhodamine Red, Rhodol Green, 2′,4′,5′,7′-Tetra-bromosulfone-fluorescein, Tetramethyl-rhodamine (TMR), Carboxytetramethylrhodamine (TAMRA), Texas Red, and Texas Red-X.
Probes can also be labeled using fluorescence emitting metals such as 152Eu, or others of the lanthanide series. These metals can be attached to the antibody molecule using such metal chelating groups as diethylenetriaminepentaacetic acid (DTPA), tetraaza-cyclododecane-tetraacetic acid (DOTA) or ethylenediaminetetraacetic acid (EDTA).
In addition to the various detectable moieties mentioned above, the probes in the kits of the invention may also be conjugated to other types of labels, such as spectrally resolvable quantum dots, metal nanoparticles or nanoclusters, etc., which may be directly attached to a nucleic acid probe. As mentioned above, detectable moieties need not themselves be directly detectable. For example, they may act on a substrate which is detected, or they may require modification to become detectable.
For in vivo detection, probes may be conjugated to radionuclides either directly or by using an intermediary functional group. An intermediary group which is often used to bind radioisotopes, which exist as metallic cations, to antibodies is diethylenetriaminepentaacetic acid (DTPA) or tetraaza-cyclododecane-tetraacetic acid (DOTA). Typical examples of metallic cations which are bound in this manner are 99Tc, 123I, 111In, 131I, 97Ru, 67Cu, 67Ga, and 68Ga.
Moreover, probes may be tagged with an NMR imaging agent which include paramagnetic atoms. The use of an NMR imaging agent allows the in vivo diagnosis of the presence of and the extent of the cancer in a patient using NMR techniques. Elements which are particularly useful in this manner are 157Gd, 55Mn, 162Dy, 52Cr, and 56Fe.
Detection of the labeled probes can be accomplished by a scintillation counter, for example, if the detectable label is a radioactive gamma emitter, or by a fluorometer, for example, if the label is a fluorescent material. In the case of an enzyme label, the detection can be accomplished by colorimetric methods which employ a substrate for the enzyme. Detection may also be accomplished by visual comparison of the extent of the enzymatic reaction of a substrate to similarly prepared standards.
In another embodiment, the invention relates to a method of determining a gene expression profile for a cancer. The method comprises detecting the expression of genes in both cancerous and non-cancerous samples (e.g., tissue samples) from the same individual (see Example 1 below). In a particular embodiment, the cancerous and non-cancerous samples from the same individual are adjacent or paired samples (e.g., adjacent or paired hepatocellular carcinoma and normal liver tissue samples). The expression of genes in a sample can be detected using any suitable gene expression detection method described herein. Moreover, suitable methods for determining differences in gene expression levels between two samples (e.g., adjacent or paired cancer and normal tissue samples) are known to those of skill in the art and include, for example, those described herein. According to the invention, genes that are identified as being differentially expressed between the cancerous and non-cancerous samples are included in the gene expression profile for the cancer.
A description of example embodiments of the invention follows.
Tissues of HCC and adjacent non-tumorous liver were collected from fresh specimens surgically removed from human patients for therapeutic purpose. These specimens were collected under direct supervision of attending pathologists. The collected tissues were immediately stored in liquid nitrogen at the Tumor Bank of the Koo Foundation Sun Yat-Sen Cancer Center (KF-SYSCC). Paired tissue samples from eighteen HCC patients were available for the study. The study was approved by the Institutional Review Board and written informed consent was obtained from all patients. The clinical characteristics of the eighteen HCC patients from this study are summarized in Table 2.
mRNA Transcript Profiling
Total RNA was isolated from tissues frozen in liquid nitrogen using Trizol reagents (Invitrogen, Carlsbad, Calif.). The isolated RNA was further purified using RNAEasy Mini kit (Qiagen, Valencia, Calif.), and its quality assessed using the RNA 6000 Nano assay in an Agilent 2100 Bioanalyzer (Agilent Technologies, Waldbronn, Germany). All RNA samples used for the study had an RNA Integrity Number (RIN) greater than 5.7 (8.2±1.0, mean±SD). Hybridization targets were prepared from 8 μg total RNA according to Affymetrix protocols and hybridized to an Affymetrix U133A GeneChip, which contains 22,238 probe-sets for approximately 13,000 human genes. Immediately following hybridization, the hybridized array underwent automated washing and staining using an Affymetrix GeneChip fluidics station 400 and the EukGE WS2v4 protocol. Thereafter, U133A GeneChips were scanned in an Affymetrix GeneArray scanner 2500.
Affymetrix Microarray Analysis Suite (MAS) 5.0 software was used to generate present calls for the microarray data for all 18 pairs of HCC and adjacent non-tumor liver tissues. All parameters for present call determination were default values. Each probe-set was determined as “present”, “absent” or “marginal” by MAS 5.0. Similarly, the same microarray data were processed using dChip version-2004 software to determine “present”, “absent” or “marginal” status for each probe-set on the microarrays.
Identification of Probe-Sets with Significant Differential Expression
For identification of genes with significant differential expression (i.e., gene expression that is robust in one sample (e.g., an HCC sample), but absent or marginal in an adjacent sample (e.g., a normal liver sample)) between HCC and adjacent non-tumor liver tissues, software written using Practical Extraction and Report Language (PERL) was used according to the following rules: “Tumor-specific genes” were defined as probe-sets that were called “present” in HCC and “absent” or “marginal” in the adjacent non-tumor liver tissue by both MAS 5.0 and dChip. “Non-tumor liver tissue-specific genes” were defined as probe-sets called “absent” or “marginal” in HCC and “present” in the paired adjacent non-tumor liver tissue by both MAS 5.0 and dChip. A flowchart diagram depicting the identification algorithm is shown in
In addition to the microarray data collected from the 18 pairs of HCC and adjacent non-tumorous liver tissues, further microarray data were obtained from 82 HCC tissue samples and 168 nasopharyngeal carcinoma (NPC) tissue samples that were collected in a similar manner. The SCIANTIS™ System Pro commercial microarray database (Gene Logic Inc., Gaithersburg, Md.) for various normal and tumor tissues was used for validation purposes. The commercial SCIANTIS™ gene expression datasets are based on Affymetrix HG-U133 A Genechip technology. For a given type of cancer or normal tissue, expression intensity of each probe-set was supplied as mean signal intensity plus standard deviation of a cohort after normalization of gene expression data of each microarray to a global trimmed mean of 100 by MAS 5.0. In addition, microarray datasets from public sources were also used in these studies (Table 3).
One way or two ways hierarchical clustering analyses were conducted by using Cluster (Version 2.11) software, and results were visualized in TreeView (Version 1.60) software, both of which are provided for public use by the laboratory of Michael B. Eisen, Ph.D. of Lawrence Berkeley National Lab and the Department of Molecular and Cellular Biology, Univerisity of California at Berkeley.
Selection of Probe-Sets/Genes to Differentiate Cancers from Normal Tissues
To determine the optimal stringency for selecting probe-sets that can differentiate cancerous from non-cancerous tissues, probe-sets of extreme differential expression between paired HCC and adjacent non-tumorous liver tissue were identified at different selection stringencies ranging from 1 to 16. A stringency of 17 or 18 was not considered because there was only 1 probe set for a stringency of 17 and 0 probe sets for a stringency of 18. These probe-sets were applied to gene expression data for various normal and tumor tissues available in the SCIANTIS™ System Pro microarray database. Data sets for different subtypes of human primary cancers and their corresponding normal tissues were selected for further statistical comparison only if the sets included a minimum of eight samples for both normal and affected cohorts. Data sets for a total of 20 different subtypes of cancers and corresponding normal tissues meeting these criteria were identified. The fraction (q) of total probe-sets (n=22,283) that exhibited a statistically significant difference in expression (p<0.05 by Welch's t-test) between a type of cancer and a normal counterpart according to the data provided in the SCIANTIS™ System Pro database, and the number of highly differentially expressed probe-sets (k), were determined for different selection stringencies. The density distribution [binomial (k,q)] of randomly selected probe-sets from the SCIANTIS™ System Pro database showing significant differences in expression between a specific type of cancer and a corresponding normal tissue was then determined. Using the resulting density distribution curve based on the randomly-selected probe-sets, the statistical significance of k probe-sets to differentiate a cancer from the corresponding normal tissue was determined.
Two-sample Welch t-tests assuming unequal variance between normal and malignant groups were conducted for all 22,238 human probe-sets available on the U133A gene chips for each of 20 subtypes of cancer selected from the SCIANTIS™ System Pro commercial microarray database for this study. The associated t-statistics and p-values were calculated and used to build a distribution curve to assess the likelihood that any 75 randomly selected probe-sets would give smaller p-values than the 75 universal signature probe-sets that were identified in this study. To this end, 10,000 lists of 75 randomly selected probe-sets were generated and each list was applied to each of the 20 different subtypes of cancers. The 1,500 p-values associated with each random list for the 20 subtypes of cancers were sorted and plotted against their ranks. Hierarchical clustering analysis of t-values generated from t-statistics was also employed for validation purposes. Two analyses using 75 probe-sets and 20 different subtypes of cancer and their normal tissues were performed. The seventy five probe-sets identified as universal neoplastic signature in this study were evaluated for the 20 subtypes of cancers and normal tissues. Fifteen hundred t-values were obtained. The 1500 t-values were further analyzed by hierarchical clustering analysis (
Statistical analyses, including Chi-square test, Fisher's exact test, t-test, and survival analyses (log-rank and Wilcoxon tests), were conducted using SAS software (Version 9.1.3).
TaqMan™ real-time quantitative reverse transcriptase-PCR(qRT-PCR) was used to quantify mRNA. cDNA was synthesized from 8 μg of total RNA for each sample using 1500 ng oligo(dT) primer and 600 units SuperScript™ II Reverse Transcriptase from Invitrogen (Carlsbad, Calif.) in a final volume of 60 μl according to the manufacturer's instructions. For each RT-PCR reaction, 0.5 μl cDNA was used as template in a final volume of 25 μl following the manufacturers' instructions (ABI and Roche). The PCR reactions were carried out using an Applied Biosystems 7900HT Real-Time PCR system. Probes and reagents required for the experiments were obtained from Applied Biosystems (ABI) (Foster City, Calif.). The sequences of primers and the probes used for real-time quantitative RT-PCR are listed in Table 4. Hypoxanthine-guanine phosphoribosyltransferase (HPRT) housekeeping gene was used as an endogenous reference for normalization. All samples were run in duplicate on the same PCR plate for the same target mRNA and the endogenous reference HPRT mRNA. The relative quantities of target mRNAs were calculated by comparative Ct method according to manufacturer's instructions (User Bulletin #2, ABI Prism 7700 Sequence Detection System). A non-tumorous liver sample was chosen as the relative calibrator for calculation.
In order to identify tumor specific-genes that are specifically expressed in hepatocellular carcinoma tissues, gene expression profiles were generated for 18 pairs of HCC and adjacent non-tumorous liver tissue samples as described above. To ensure that the profiles included genes with robust expression, only those genes showing significant differential expression by both MAS 5.0 and dChip software were selected. The number of probe sets corresponding to genes showing significant differential expression between hepatocellular carcinoma and adjacent non-tumorous liver tissues in 18 paired samples using different selection stringencies are shown in Table 5. The number of probe-sets showing significant differential expression increased as the stringency was relaxed (i.e., from genes differentially expressed between HCC and normal tissues in all 18 sample pairs (high selection stringency of 18) to genes differentially expressed between HCC and normal tissues in 1 out of 18 sample pairs (low selection stringency of 1).
To determine the optimal stringency for selecting probe-sets that can differentiate cancerous from non-cancerous tissues, different selection stringencies were applied to gene expression data sets for various normal and tumor tissues available in the SCIANTIS™ System Pro microarray database. Data sets for different subtypes of human primary cancers and their corresponding normal tissues were selected if the sets included a minimum of eight samples for both normal and affected cohorts. Data sets for a total of 20 different subtypes of cancers and corresponding normal tissues meeting these criteria were identified (Table 6).
The fraction (q) of total probe-sets (n=22,283) that exhibited a statistically significant difference in expression (p<0.05 by Welch's t-test) between a type of cancer and a normal counterpart according to the data provided in the SCIANTIS™ System Pro database, and the number of highly differentially expressed probe-sets (k), were determined at the 18 different selection stringencies shown in Table 5. This systematic statistical analysis revealed that a stringency of 12 out of 18 pairs selected for 75 probe-sets that could differentiate cancer tissues from their respective normal tissues with p-values <0.005 for 19 out of 20 different cancer subtypes (
The expression intensities of the genes represented by the 75 probe-sets were compared in the microarray data obtained from HCC and adjacent non-tumorous liver tissues. There was little overlap in expression intensities of these genes between the paired HCC and adjacent non-tumorous liver tissue samples (
To confirm that the 18 paired HCC samples used in this study were sufficiently representative of this type of cancer, gene expression intensities of the 75 probe-sets were assessed in 82 additional HCC samples, in the absence of paired adjacent non-tumorous liver tissues. As shown in
To validate the finding that these 75 probe-sets represented genes displaying significant differential expression between HCC and non-tumorous liver tissues, a series of real-time quantitative reverse transcriptase polymerase chain reaction (RT-qPCR) experiments were conducted on RNA samples from the 18 paired HCC and non-tumorous liver tissues used in the study. The available RNA samples were sufficient to study 39 of the genes represented in the CNS. All 39 genes had appropriate 3′ end DNA sequence across an intron for reliable RT-qPCR study. The results
Functional annotation of the significant differential expression genes represented by the 75 probe-sets described in Example 1 was obtained using the Bioinformatic Harvester database of the Karlsruhe Institute of Technology and the Ingenuity Pathway Analysis database (Ingenuity® Systems).
In the Bioinformatic Harvester database, the 55 genes represented by the 59 tumor-specific probe-sets were designated as having the following biological functions: cell cycle/proliferation (27 genes), regulation of gene transcription/expression (9 genes), cell differentiation (2 genes), angiogenesis (3 genes), signal transduction (2 genes), apoptosis (2 genes), other (5 genes) or unknown function (5 genes) (
Of these 55 genes, 47 were found to be present in the Ingenuity Pathway
Analysis database, wherein 32 were designated as being involved in the cell cycle, 14 in regulation of gene expression and 1 in lipid metabolism (
The 16 probe-sets that showed specific expression in non-tumorous, normal liver tissue were determined to include genes having a variety of functions, including functions related to immune responses (3 genes), sugar binding (2 genes), drug metabolism (2 genes), binding of corticotropin releasing hormone (1 gene), muscle contraction/digestion (1 gene), carbohydrate metabolism (1 gene), lipid/cholesterol metabolism (1 gene), potassium ion transport (1 gene), scavenger receptor activity (1 gene), cell motility (1 gene), cell cycle (1 gene), and cell adhesion (1 gene) (
Hierarchical clustering analyses were performed as described in Example 1.
The majority of genes (55) represented by the 75 probe-sets identified in Example 1 were tumor-specific and were identified as being involved in the cell cycle and/or cell proliferation (
To confirm this finding, statistical comparisons of gene expression in cancer and normal tissues were conducted for each of the 75 probe-sets using the datasets in the SCIANTIS™ System Pro database for the twenty different subtypes of cancer chosen for this study. Specifically, a two-sample Welch's t-test was performed for each gene for all 20 types of cancer. Hierarchical clustering analysis was then conducted using the t-values obtained from these comparisons (FIGS. 23A,B). High positive t-values were calculated for all tumor-specific probe-sets, while negative t-values were calculated for all normal tissue-specific probe-sets.
For any given cancer, a large number of genes showing significant differential expression between tumor and normal tissues is expected. Consistent with this expectation, 52% of probe-sets (n=22,283) in the dataset showed statistically significant (i.e., p-values <0.05) differences in gene expression between infiltrating ductal carcinomas and normal breast tissues. Thus, random selection of any group of genes is likely to include some genes that are differentially expressed between tumor and normal tissues. Therefore, it is critical to ensure that probe sets identified as differentially expressed between paired HCC and adjacent non-tumorous tissue samples are significantly greater in number than any randomly selected 75 probe-sets.
Accordingly, a control study was performed in which seventy-five (75) probe-sets were randomly selected 10,000 times. Gene expression intensities in cancer and normal tissues were compared for each gene represented in the randomly selected probe-sets using the SCIANTIS™ gene expression datasets for the 20 different subtypes of cancer and corresponding normal tissues selected for this study, as described in Example 1. The results demonstrated that genes represented by the 75 probe-sets identified in our study as being differentially expressed between HCC and corresponding normal tissues significantly outnumber the number of randomly selected 75 probe-sets that were differentially expressed between HCC and corresponding normal tissues (
These results support the conclusion that the genes represented by the 75 probe-sets identified in this study (see Example 1) constitute a common neoplastic signature (CNS), and that expression of these genes and their products (e.g., proteins, peptides, mRNA) can be used as universal markers for cancer.
Hierarchical clustering analyses were performed as described in Example 1.
Statistical analyses, including Chi-square test, Fisher's exact test, t-test, and survival analyses (log-rank and Wilcoxon tests), were conducted using SAS software (Version 9.1.3). To assess how the expression of each tumor-specific gene in the common neoplastic signature was correlated with time-dependent overall or distant metastasis-free survival, Cox regression analysis based on proportional hazards model was performed using S-plus software (Version 6) for the datasets of HCC, NPC or breast cancer.
If expression of the genes in the common neoplastic signature is associated with cellular proliferation, hierarchical cluster analysis should reveal elevated expression of these genes in different types of normal tissues and organs that have high proliferation activities. The heat map of hierarchical clustering analysis revealed that genes represented by the 59 tumor-specific probe-sets had elevated expression in highly proliferative normal tissues and organs including bone marrow (hematopoietic organ), thymus, uterus and testis (
Based on these results, it was hypothesized that cancers with much higher expression of the 59 tumor-specific probe-sets genes would be more proliferative and correlate with larger tumor size and/or a more advanced TNM stage of patients. To test this hypothesis, hierarchical cluster analyses were conducted on breast cancer (n=295), HCC (n=100) and nasopharyngeal carcinomas (n=260), because data regarding tumor size and TNM stage were available for these types of cancer. Each type of cancer was classified into two groups according to gene expression of the 75 probe-sets (
Hierarchical clustering analyses were performed as described in Example 1.
Statistical analyses were performed as described in Example 4.
To determine whether tumors displaying increased expression of the 55 genes represented by the 59 tumor-specific probe-sets, and reduced expression of the 16 genes represented by the 16 normal tissue-specific probe-sets, are associated with a poor survival outcome relative to other tumors, the same HCC, breast cancer and nasopharyngeal carcinoma samples described in Example 4 were classified by hierarchical clustering analysis (
Notably, expression of the genes represented by these 75-probe sets, which were identified by gene expression differences between hepatocellular carcinoma and non-tumorous liver tissues, could be used successfully to classify breast cancers according to survival and risk for distant metastasis (
Hierarchical clustering analyses were performed as described in Example 1.
Statistical analyses were performed as described in Example 4.
It is well known that tumors having poor clinical outcomes are frequently poorly differentiated. To determine whether increased expression of the 55 genes represented by the 59 tumor-specific probe-sets are associated with poor tumor differentiation, hierarchical clustering analysis was conducted on adult male germ cell tumors with different degrees of differentiation. The results showed that “teratomas” known to contain highly differentiated mature tissues were clustered together with reduced expression of the 59 tumor-specific probe-sets and increased expression of the 16 normal tissue-specific probe-sets (
To determine whether differentiation grades of HCC and breast cancer tumors clustered according to the gene expression intensities of the 75 probe-sets identified in Example 1, a statistical correlation study was conducted (
As discussed in Example 5, 55 different genes represented by 59 tumor-specific probe-sets were closely associated with survival and/or distant metastasis in three very different types of cancers (
The relevant teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 61/123,761, filed on Apr. 11, 2008. The entire teachings of the above application are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US09/02196 | 4/8/2009 | WO | 00 | 11/22/2010 |
Number | Date | Country | |
---|---|---|---|
61123761 | Apr 2008 | US |