The present invention relates generally to methods and diagnostic kits for predicting or diagnosing colorectal cancer by measuring biomarkers.
Colorectal cancer (also known as colon cancer, rectal cancer or bowel cancer) is the development of cancer in the colon or rectum (parts of the large intestine). It is due to the abnormal growth of cells that have the ability to invade or spread to other parts of the body. Signs and symptoms may include blood in the stool, a change in bowel movements, weight loss, and feeling tired all the time.
Most colorectal cancers are due to lifestyle factors and increasing age, with only a small number of cases due to inherited genetic disorders. Risk factors include diet, obesity, smoking, and not enough physical activity. Dietary factors that increase the risk include red and processed meat, as well as alcohol. Another risk factor is inflammatory bowel disease, which includes Crohn's disease and ulcerative colitis. Some of the inherited conditions that can cause colorectal cancer include: familial adenomatous polyposis and hereditary non-polyposis colon cancer; however, these represent less than 5% of cases. It typically starts as a benign tumor which over time becomes cancerous.
Diagnosis of colorectal cancer is via sampling of areas of the colon suspicious for possible tumor development typically done during colonoscopy or sigmoidoscopy, depending on the location of the lesion. However, no biomarkers which would provide information for developmental stages of colorectal cancer and guide the treatment have identified. The present invention discloses at least 11 biomarkers for colorectal cancer, and develops diagnostic process and kit for diagnosis with the biomarkers.
Extensive genomic characterizations of human cancers have provided the most compelling demonstrations of function-altering mutations and of ongoing genomic instability during tumor progression. However, it is not fully understood how dozens of mutated tumor suppressor genes and oncogenes drive cancers. As proteins link genotypes to phenotypes, alterations in the proteome of cancer cells shall play crucial roles during carcinogenesis. The present invention shows the first comprehensive map of the colorectal cancer proteome and its abnormal features by proteomic analysis of paired cancers and adjacent normal tissues. A novel strategy for pathway analysis enabled us to discover a number of abnormalities of the colorectal cancer proteome, which included an imbalance in protein abundance of the inhibitory and activating regulators in key signal pathways, a significant elevation of proteins responsible for chromatin modification, gene expression and DNA replication and damage repair, and a decreased expression of proteins responsible for core extracellular matrix architectures. Our discovery provides indispensable information to complement available genomic data towards a better understanding of cancer biology.
An object of the invention is to provide means allowing an early detection of colon adenoma and/or colon carcinoma.
It is a further object to provide means of allowing a selective and specific detection of colon adenoma and/or colon carcinoma by a non-invasive method.
It is a further object to provide a biomarker which can be used in the detection of colorectal adenoma and/or carcinoma.
Another object of the present invention is to provide a test system for detecting colorectal adenoma or carcinoma which is cost effective and can be widely used.
Moreover, the test system should be easy to handle and more convenient for the individual to be examined for colorectal adenoma and/or carcinoma.
It is a further object of the present invention to provide a screening system for determining whether a compound is effective in the treatment of colorectal adenoma and/or carcinoma.
The objects underlying the present invention are solved by the use of CAM1, CPA3, OLM4, LAD1, DPEP1, OGFR, EPHB3, PKP3, CEAM6, SERPINB5 and MUC13 proteins, and/or their derivatives thereof as a biomarker for the detection of colorectal adenoma and/or colorectal carcinoma in an individual. The detection can be carried out in vivo and in vitro. Pursuant to a preferred embodiment, the detection is carried out in vitro. The following description on CREAM6 and its derivatives is an example to disclose the present invention, which can be used for CAM1, CPA3, OLM4, LAD1, DPEP1, OGFR, EPHB3, PKP3, SERPINB5 and MUC13 proteins, and/or their derivatives.
The objects are further solved by a method for detecting colorectal adenoma and/or colorectal carcinoma comprising the steps: a) providing an isolated sample material which has been taken from an individual, b) determining the level of CEAM6 or a derivative thereof in said isolated sample material, c) comparing the determined level of CEAM6 or a derivative thereof with one or more reference values.
The objects are further solved by a method for discriminating between colorectal adenoma and colorectal carcinoma comprising the steps: a) providing an isolated sample material which has been taken from an individual, b) determining the level of CEAM6 or a derivative thereof in said isolated sample material, c) comparing the determined level of CEAM6 or a derivative thereof with one or more reference values.
The objects are also solved by a method for monitoring the development and/or the course and/or the treatment of colorectal adenoma and/or colorectal carcinoma comprising the steps: a) providing an isolated sample material which has been taken from an individual, b) determining the level of CEAM6 or a derivative thereof in said isolated sample material, c) comparing the determined level of CEAM6 or derivative thereof with one or more reference values.
In a preferred embodiment the effectiveness of a surgical or therapeutically procedure is controlled in order to decide as to whether the colorectal adenoma and/or colorectal carcinoma is completely removed. In another embodiment the therapy of a colorectal adenoma and/or colorectal cancer patient with one or more chemical substances, antibodies, antisense-RNA, radiation, e.g. X-rays or combinations thereof is controlled in order to control the effectiveness of the treatment.
The objects are solved as well by providing a test system for detecting colorectal adenoma and/or colorectal cancer in a sample of an individual comprising: a) an antibody or a receptor which binds to an epitope of CEAM6 or a derivative thereof, b) a solid support which supports said antibody or receptor, c) a reagent for detecting the binding of said epitope of CEAM6 or a derivative thereof to said antibody or receptor.
The objects are furthermore solved by the provision of an array comprising detection molecules for detecting of colorectal adenoma and/or colorectal carcinoma in an individual comprising as detection molecule: a) a nucleic acid probe immobilized to a solid support for binding to and detecting mRNA encoding CEAM6 or a derivative thereof and/or for binding to and detecting CEAM6 proteins or derivatives thereof, or b) an antibody immobilized to a solid support for binding to and detecting of an epitope of CEAM6 or a derivative thereof, or c) a receptor immobilized to a solid support for binding to and detecting of an epitope of CEAM6 or a derivative thereof, wherein preferably each different amounts of detection molecules are immobilized to the solid support to increase the accuracy of the quantification.
The nucleic acid probe is for example selected from the group consisting of single-stranded or double-stranded DNA or RNA, aptamers and combinations thereof. Aptamers are single-stranded oligonucleotides that assume a specific, sequence-dependent shape and bind to protein targets with high specificity and affinity. Aptamers are identified using the SELEX process (Tuerk C. and Gold L. (1990) Science 249: 505-510; Ellington A D and Szostak J W. (1990) Nature 346: 818-822).
The objects are furthermore solved by a method for determining whether a compound is effective in the treatment of colorectal adenoma and/or colorectal carcinoma comprising the steps: a) treating of a colorectal adenoma or colorectal carcinoma patient with a compound, b) determining the level of CEAM6 or a derivative thereof in a sample material of said patient, and c) comparing the determined level of CEAM6 or a derivative thereof with one or more reference values.
Preferred embodiments are specified in dependent claims.
According to the present invention the term “sample material” is also designated as “sample”.
Pursuant to the present invention the term “biomarker” is meant to designate a protein or protein fragment or a nucleic acid which is indicative for the incidence of the colorectal adenoma and/or colorectal carcinoma. That means the “biomarker” is used as a mean for detecting colorectal adenoma and/or colorectal carcinoma.
The term “individual” or “individuals” is meant to designate a mammal. Preferably, the mammal is a human being such as a patient.
The term “healthy individual” or “healthy individuals” is meant to designate individual(s) not diseased of colorectal adenoma and/or colorectal carcinoma. That is to say, the term “healthy individual(s)” is used only in respect of the pathological condition of colorectal adenoma and/or colorectal carcinoma and does not exclude the individual to suffer from diseases other than colorectal adenoma and/or colorectal carcinoma.
The term “derivative thereof” is meant to describe any modification on DNA, mRNA or protein level comprising e.g. the truncated gene, fragments of said gene, a mutated gene, or modified gene. The term “gene” includes nucleic acid sequences, such as DNA, RNA, mRNA or protein sequences or oligopeptide sequences or peptide sequences. The derivative can be a modification which is an result of a deletion, substitution or insertion of the gene. The gene modification can be a result of the naturally occurring gene variability. The term “naturally occurring gene variability” means modifications which are not a result of genetic engineering. The gene modification can be a result of the processing of the gene or gene product within the body and/or a degradation product. The modification on protein level can be due to enzymatic or chemical modification within the body. For example the modification can be a glycosylation or phosphorylation. Preferably, the derivative codes for or comprises at least 5 amino acids, more preferably 10 amino acids, most preferably 20 amino acids of the unmodified protein. In one embodiment the derivative codes for at least one epitope of the respective protein.
The term “epitope” is meant to designate any structural element of a protein or peptide or any proteinaceous structure allowing the specific binding of an antibody, an antibody fragment, a protein or peptide structure or a receptor.
The methods of the present invention are carried out with sample material such as a body fluid or tissue sample which already has been isolated from the human body. Subsequently the sample material can be fractionated and/or purified. It is for example possible, to store the sample material to be tested in a freezer and to carry out the methods of the present invention at an appropriate point in time after thawing the respective sample material.
It has been surprisingly discovered by the present inventors that the protein CEAM6 or a derivative thereof can be used as a biomarker for the detection of colorectal adenoma and/or carcinoma. The inventors have now surprisingly found that the level protein CEAM6 or a derivative thereof in a tissue sample and/or body fluid is elevated in individuals having colorectal adenoma and/or carcinoma. Furthermore, the protein CEAM6 level or a derivative thereof in a tissue sample and/or body fluid can be used to distinguish healthy people from people having colorectal adenoma and/or carcinoma as well as people having colorectal adenoma from people having colorectal carcinoma.
Pursuant to the present invention, sample material can be tissue, cells or a body fluid. Preferably the sample material is a body fluid such as blood, blood plasma, blood serum, bone marrow, stool, synovial fluid, lymphatic fluid, cerebrospinal fluid, sputum, urine, mother milk, sperm, exudate and mixtures thereof. In a preferred embodiment the body fluids are fractionated with antibody affinity chromatography. The CEAM6 protein is for example eluted at pH 3.0.
Preferably, the body fluid has been isolated before carrying out the methods of the present invention. The methods of the invention are preferably carried out in vitro by a technician in a laboratory.
According to a preferred embodiment of the invention, CEAM6 is measured in blood plasma or blood serum. Blood serum can be easily obtained by taking blood from an individual to be medically examined and separating the supernatant from the clotted blood.
The level of CEAM6 or a derivative thereof in the body fluid, preferably blood serum, is higher with progressive formation of colorectal adenoma. The colorectal adenoma is a benign neoplasma which may become malign. When developing colorectal cancer from benign colorectal adenoma, the level of CEAM6 or a derivative thereof in body fluids, preferably blood serum, further is elevated.
After transformation of colorectal adenoma into colorectal cancer, the pathological condition of the afflicted individual can be further exacerbated by formation of metastasis.
The present invention provides an early stage biomarker which allows to detect the neoplastic disease at an early and still benign stage, neoplastic disease at an early stage or benign stage and/or early tumor stages. The early detection enables the physician to timely remove the colorectal adenoma and to dramatically increase the chance of the individual to survive.
Moreover, the present invention allows to monitor the level of CEAM6 or a derivative thereof in a body fluid such as blood serum over an extended period of time, such as years.
The long term monitoring allows to differentiate between healthy individuals and colorectal adenoma and/or colorectal carcinoma. The level of CEAM6 or a derivative thereof can be routinely checked, for example, once or twice a year. If an increase of the level of CEAM6 or a derivative thereof is detected this can be indicative for colorectal adenoma and/or early colorectal carcinoma. A further increase of the level of CEAM6 or a derivative thereof can then be indicative for the transformation into malign colorectal carcinoma.
Moreover, the course of the disease and/or the treatment can be monitored. If the level of CEAM6 or a derivative thereof further increases, for example after removal of the colorectal adenoma, this can be indicative for exacerbation of the pathological condition.
That means, the level of CEAM6 or a derivative thereof is a valuable clinical parameter for detecting and/or monitoring of colorectal adenoma and/or colorectal carcinoma. The level of CEAM6 or a derivative thereof in body fluids is higher after incidence of colorectal adenoma. Therefore, the level of CEAM6 or a derivative thereof is an important clinical parameter to allow an early diagnosis and, consequently, an early treatment of the disease. In a preferred embodiment patients with elevated CEAM6 levels or derivatives thereof are subsequently exanimated by colonoscopy.
The method of the invention for detection of colorectal adenoma and/or colorectal carcinoma comprises the step of providing an isolated sample material which has been taken from an individual, then determining the level of CEAM6 or a derivative thereof in the isolated sample material, and finally comparing the determined level of CEAM6 or a derivative thereof with one or more reference values. In one embodiment, one or more further biomarker(s) is/are additionally detected in an isolated sample material which has been taken from an individual, the level of the biomarker(s) is/are determined and compared with one or more respective reference values.
The reference value can be calculated as the average level of CEAM6 or a derivative thereof determined in a plurality of isolated samples of healthy individuals or individuals suffering from colorectal adenoma and/or colorectal carcinoma. This reference value can be established as a range to be considered as normal meaning that the person is healthy or suffers from colorectal adenoma and/or colorectal carcinoma. A specific value within a range can then be indicative for healthy condition or the pathological condition of colorectal adenoma and/or colorectal carcinoma. This range of reference value can be established by taking a statistically relevant number of body fluid samples, such as serum samples, of healthy individuals as it is done for any other medical parameter range such as, e.g., blood sugar. Preferably, two reference values are calculated which are designated as negative control and positive control 1. The reference value of the negative control is calculated from healthy individuals and the positive control is calculated from individuals suffering from colorectal adenoma or colorectal carcinoma. More preferably, three reference values are calculated which are designated as negative control and positive control 1 and positive control 2. Positive control 1 can be calculated from individuals suffering from colorectal carcinoma and positive control 2 can be calculated from individuals suffering from colorectal adenoma.
In an another embodiment of the present invention, the reference values can be individual reference values calculated as the average level of CEAM6 or a derivative thereof determined in a plurality of isolated samples taken from the individual over a period of time.
When monitoring the level of CEAM6 or a derivative thereof over an extended period of time, such as months or years, it is possible to establish an individual average level. The CEAM6 or a derivative thereof level can be measured, for example, from the same blood serum sample when measuring blood sugar and can be used to establish an individual calibration curve allowing to specifically detect any individual increase of the level of CEAM6 or a derivative thereof.
The reference value for further biomarkers can also be calculated in the same way as described for CEAM6. The average levels of CEAM6 or further biomarkers may be the mean or median level.
In another aspect the present invention further provides a test system for detecting colorectal adenoma and/or colorectal carcinoma in an isolated sample material of an individual. The test system is based either on the specificity of an antibody or a receptor to specifically bind to an epitope or a suitable structural element of CEAM6 or a derivative thereof or a fragment of thereof. A receptor can be any structure able to bind specifically to CEAM6 or a derivative thereof. The receptor can be, for example, an antibody fragment such as an Fab or an F(ab′).sub.2 fragment or any other protein or peptide structure being able to specifically bind to CEAM6 or a derivative thereof.
The antibody, antibody fragment or receptor is bound to a solid support such as, e.g., a plastic surface or beads to allow binding and detection of CEAM6 or a derivative thereof. For example, a conventional microtiter plate can be used as a plastic surface. The detection of the binding of CEAM6 or a derivative thereof can be effected, for example, by using a secondary antibody labelled with a detectable group. The detectable group can be, for example, a radioactive isotope or an enzyme like horseradish peroxidase or alkaline phosphatase detectable by adding a suitable substrate to produce, for example, a color or a fluorescence signal.
The test system can be an immunoassay such as an enzyme-linked immunosorbentassay (ELISA) or a radio immunoassay (RIA) or luminescence immunossay (LIA). However, any other immunological test system using the specificity of antibodies or fragments of antibodies can be used such as Western blotting or immuno precipitation.
The present invention also provides an array comprising detection molecules for detecting colorectal adenoma and/or colorectal carcinoma in an individual, wherein the detection molecule can be a nucleic acid probe immobilized on a solid support for binding to and detecting of mRNA encoding CEAM6, fragments, mutations, variants or derivatives thereof, or an antibody immobilized on a solid support for binding to and detecting of an epitope of CEAM6 or a derivative thereof, or a receptor immobilized on a solid support for binding to and detecting of an epitope of CEAM6 or a derivative thereof. Preferably, the array comprises further detection molecules which are biomarkers for detecting colorectal adenoma and colorectal carcinoma.
The nucleic acid probe can be any natural occurring or synthetic oligonucleotide or chemically modified oligonucleotides, as well as cDNA, mRNA, aptamer and the like.
Alternatively, the present invention also comprises an inverse array comprising patient samples immobilized on a solid support which can be detected by the above defined detection molecules.
Preferably the array comprises detection molecules which are immobilized to a solid surface at identifiable positions.
The term “array” as used in the present invention refers to a grouping or an arrangement, without being necessarily a regular arrangement. An array comprises preferably at least 2, more preferably 5 different sets of detection molecules or patient samples. Preferably, the array of the present invention comprises at least 50 sets of detection molecules or patient samples, further preferred at least 100 sets of detection molecules or patient samples. Pursuant to another embodiment of the invention the array of the present invention comprises at least 500 sets of detection molecules or patient samples. The detection molecule can be for example a nucleic acid probe or an antibody or a receptor.
The described array can be used in a test system according to the invention. The array can be either a micro array or a macro array.
The detection molecules are immobilized to a solid surface or support or solid support surface. This array or microarray is then screened by hybridizing nucleic acid probes prepared from patient samples or by contacting the array with proteinaceous probes prepared from patient samples.
The support can be a polymeric material such as nylon or plastic or an inorganic material such as silicon, for example a silicon wafer, or ceramic. Pursuant to a preferred embodiment, glass (SiO.sub.2) is used as solid support material. The glass can be a glass slide or glass chip. Pursuant to another embodiment of the invention the glass substrate has an atomically flat surface.
For example, the array can be comprised of immobilized nucleic acid probes able to specifically bind to mRNA of CEAM6 or a derivative thereof or antibodies specifically bind to CEAM6 protein or derivatives thereof being present in a body fluid such as serum. Another preferred embodiment is to produce cDNA by reverse transcription of CEAM6 encoding mRNA or of mRNA encoding a derivative of CEAM6 and to specifically detect the amount of respective cDNA with said array. The array technology is known to the skilled person. A quantification of the measured mRNA or cDNA or proteins, respectively, can be effected by comparison of the measured values with a standard or calibration curve of known amounts of CEAM6 or a derivative thereof mRNA or cDNA or proteins.
Preferably, different amounts of detection molecules are immobilized each on the solid support to allow an accurate quantification of the level of CEAM6 or a derivative thereof.
Pursuant to another embodiment of the invention, the level of CEAM6 or a derivative thereof is determined by liquid chromatography tandem mass spectrometry (LC/MS/MS).
LC/MS/MS analysis allows to specifically detect CEAM6 or a derivative thereof via its sequence and to quantify the amount of CEAM6 or a derivative thereof very easily.
Preferably, the CEAM6 or a derivative thereof in the isolated sample is immobilized on a chip or solid support with an activated surface. The activated surface comprises preferably immobilized antibodies against anti-CEAM6 or a derivative thereof such as, for example, rabbit polyclonal-antibodies. After binding of the CEAM6 or a derivative thereof to the antibodies, the bounded CEAM6 was digested by trypsin or other proteinases followed by a LC/MS/MS analysis in a mass spectrometer, which delivers intensity signals for determination of the CEAM6 or a derivative thereof level.
Moreover, LC/MS/MS allows to simultaneously detect other proteins which can have a relevance with respect to the detection of colorectal adenoma and/or colorectal cancer.
In an embodiment of the present invention the sensitivity and/or specificity of the detection of colorectal adenoma and/or colorectal carcinoma is enhanced by additionally detection of a further biomarker. In particular, in one embodiment the sensitivity and/or specificity of the detection of colorectal adenoma and/or colorectal carcinoma is enhanced by detection of another protein or nucleic acid in combination with CEAM6 or a derivative thereof.
Preferably, the sensitivity and specificity of the methods, arrays, test systems and uses according to the present invention are increased by the combination of detecting CEAM6 and derivatives thereof with SerpinB5 and derivatives thereof.
In a further embodiment of the present invention the sensitivity and/or specificity of the detection of colorectal adenoma and/or colorectal carcinoma is enhanced by additionally detection of MUC13, OLM4, LAD1, DPEP1, OGFR, EPHB3, PKP3, CAM1, and CPA3, or derivatives thereof in combination with CEAM6 or a derivative thereof.
The methods of the present invention can be carried out in combination with other diagnostic methods for detection of colorectal adenoma and/or colorectal carcinoma to increase the overall sensitivity and/or specificity. The detection of CEAM6 allows a very early detection of colorectal adenoma and can therefore be used as a very early marker.
Preferably, the methods of the present invention are carried out as an early detection and/or monitoring method. If the results of the methods of the present invention should indicate the incidence of colorectal adenoma and/or colorectal adenoma, further examinations such as colonoscopy should be carried out.
Generating High-Quality Proteomic Profiles
To characterize the human CRC proteome and quantify its changes, paired CRC and adjacent normal tissue (AT) samples from 22 cases of CRC patients (Table 1) were analyzed by a standardized mass spectrometry-based proteomics workflow. 44 proteomic profiles from 704 two hour LC-MS/MS runs (44 samples×16 runs) were generated. To ensure reproducibility and relative completeness, the 44 proteomic profiles were evaluated by ten groups of well-known “housekeeping” protein complexes and scored at an average of 92 out of 100. The relative abundance of identified proteins was determined based on normalized spectral abundance factors (NSAF). To quantitatively describe the relative abundance, part per million (ppm) was used as the abundance unit and a total value of 1,000,000 ppm was assigned to the proteome of each sample. Thus, the ppm value for each identified protein was calculated based on its NSAF, and the average abundance or ppm of each identified protein in CRC and AT was obtained based on 22 CRC samples and 22 AT samples, respectively.
At the time of writing, UniProtKB/Swiss-Prot had manually reviewed protein evidence for 20193 human genes. We identified 12380 proteins across 44 samples, which accounted for approximately 60% of all the annotated proteins in the human genome. Among them, 8832 proteins were detected in both CRCs and ATs; 10030 proteins were detected in ATs, including 1197 (9.7%) proteins undetectable in CRCs; and 11183 proteins were detected in CRCs, including 2350 (19%) proteins undetectable in ATs.
We next analyzed the distribution of the identified proteins using an Excel histogram function and revealed a normal distribution with a major peak and a minor peak. The major peak represented 62% and 60% of identified proteins with relative abundances greater than 1 ppm for CRC and AT respectively. Within this population 95% of proteins have ppm values in the range from 1 to 10000 ppm.
The minor peak represented 38% and 40% of identified proteins with relative abundance less than 1 ppm for CRC and AT respectively. The majority of proteins in the minor peak were identified by one or few peptide spectrum matches across 44 samples. Since the least abundant protein population also displayed a normal distribution, it indicated that their relative abundance could be used for a comparison between CRC and AT if it's pValue was significant (e.g. p<0.01).
CRC Proteome Landscapes
In consistency with a 60% total coverage of the human proteome, chromosomes were also evenly covered at an average of 60% with notable exceptions of the Y chromosome (18.6%) and mitochondria chromosome (85.7%) by 12380 proteins identified in CRC and AT. The high incidence for identification of the mitochondria proteins was apparently correlated with their high abundances (>10 ppm) and the low incidence for Y chromosomes was also correlated with their relatively low abundances. Although there was no apparent difference between CRC and AT for the chromosome coverage, the summed chromosome protein abundances were varied. For chromosomes 13 and 20 the summed protein abundances of CRC was 27% more than that of AT, whereas the total protein abundances for chromosomes 4, 14, 16, and the mitochondrial chromosome was 10% less in CRC than that in AT.
We next assessed the coverage according to three protein classifications as described by UniprotKB: the molecular functional classification having 14420 annotated proteins, the cellular component classification having 17465 annotated proteins, and the biological process classification having 16149 annotated proteins. The average coverage for all different classes was 67% but for each individual class the coverage varied from 40% to 90%. For known low abundance protein classes the coverage was less than 45%, while for high abundance protein classes it was more than 70%, even up to 90%. The coverage for signal transducers, receptors, nucleic acid binding transcription factors and chemoattractants was less than 40% since these proteins were least abundant. The coverage for CRC and AT showed no apparent difference. Interestingly, the summed protein abundances for protein binding transcription factors, nucleic acid binding transcription factors, and translation regulators were significantly increased while those for collagen trimers, extracellular matrix parts and extracellular matrices were decreased in CRC. These changes may reflect the fact that cancer cells were in a fast growing status with a less stable structural architectures.
Proteomic Signature of CRC
Our quantitative proteomic analysis of 22 paired CRCs and ATs identified 740 significantly differentially expressed proteins (e.g. fold change >4, p<0.01) (Table 2). Among them 613 proteins had increased expression in all 22 cases of CRC patients (p<0.01), while 127 proteins showed decreased expression (p<0.01). Interestingly, although these 740 proteins encompassed about 6% of the total proteins identified, their mass was only 1.6% and 2.5% of the total mass in the CRCs and ATs, respectively. Most of the 127 proteins decreased in CRC but enriched in AT were high-abundant proteins, which were involved in cellular architectures, metabolisms and colorectal functions. In contrast, most of the 613 proteins enriched in CRC were low-abundant proteins, which were mostly involved in the regulation of cellular processes. This explained why the total mass of the 740 proteins was 58% more in AT than that in CRC.
Considering the practical reality, a small panel of protein biomarkers would have more advantages. We identified a panel of 11 proteins based on the relative abundance (mean abundance in CRC >20 ppm) from the ranked 740 proteins to distinguish cancer tissues from normal colorectal tissues obviously (
Here we have showed that a comprehensive CRC proteome map can be characterized by analyses of paired tumor and adjacent normal tissue samples using a standardized proteomics workflow and a novel pathway analysis strategy. Our data demonstrated that the abundance alteration in a group of proteins (responsible for a specific cellular function or process) instead of only individual proteins could be the major contributor of CRC, and provided evidence to interpret how a dozen or a few dozen mutated tumor driver genes facilitate uncontrolled cancer cell growth and invasion. In CRC, the mutations in APC, p53, and k-Ras, or chromosomal instability and microsatellite instability events may initiate changes of gene expression. As a result, these changes lead to significant elevations of proteins required for assembling chromatin modification, DNA replication and damage repair, and transcription and translation machinery, which in-turn fuel the proliferation of tumor cells eventually.
Essentially our findings suggest a proteomic “teeterboard” mechanism for the regulation of pathways by modulating the balance between inhibitory regulators and activating regulators. In cells, the activations of signaling pathways orchestrate the regulation of cell fate, cell survival, apoptosis, and cell proliferation; the regulation of signaling pathways and the transduction of signals are integrated in the pathway components, which could be functionally divided into inhibitory regulators and activating regulators. The balance of the two major components determines the pathway activation status. During tumorigenesis the decreased expression of inhibitory regulators and increased expression of activating regulators break off the well-organized/programmed cellular regulation network. Therefore, we propose a tumorigenesis model: molecular malfunction events including tumor driver genes' mutations, chromosomal instability and microsatellite instability initiate changes of gene expression, which lead to decreased expression of inhibitory molecules but increased expression of activating regulators to reactivate silenced pathways, and elevated expression of machinery for chromatin modification, DNA replication and damage repair, transcription and translation, altogether affording a proliferative advantage. This process was accurately reflected by the proteomic abnormality observed in cancer tissues in this study.
A panel of 11 proteins, which includes Chymase (CAM1), Mast cell carboxypeptidase A (CPA3), Olfactomedin-4 (OLM4), Ladinin-1 (LDA1), Dipeptidase-1 (DPEP1), Opioid growth factor receptor (OGFR), Ephrin type-B receptor 3 (EPHB3), Plakophilin-3 (PKP3), Carcinoembryonic antigen-related cell adhesion molecule 6 (CEAM6), SerpinB5 (SERPINB5), and Mucin-13 (MUC13), is selected as CRC protein biomarkers to comprehensively distinguish tumor from normal colorectal tissue and determine the tumor lymphatic invasion status. Two enzymes, mast cell carboxypeptidase A and chymase secreted by mast cells are significantly diminished in CRC which is used as two positive markers for normal colorectal tissue. The other nine proteins including CEAM6, SERPINB5, MUC13, OLM4, LAD1, DPEP1, OGFR, EPHB3 and PKP3 are significantly overexpressed in tumor. Based on their relative abundances in tumor cell the 9 protein panel can be used to determine the lymphatic invasion status. The tumor has higher CEAM6, SERPINB5, and MUC13 but relative lower LAD1 and DPEP1 the more it is likely at node-positive disease stage (
The instant application discloses a method for determining if a subject has an increased risk having a colorectal disease or disorder comprising:
a) isolating a biological sample containing a test specimen from a biopsy specimen from said subject,
b) isolating a biological sample containing normal colorectal cells or tissue from a biopsy specimen from said subject or a family member of said subject,
c) analyzing protein abundances of biomarkers for the samples from a) and b),
d) comparing the results from c) between the abnormal and normal colorectal cells or tissue.
The instant application discloses a set of reagents to measure the levels of biomarkers in a specimen, wherein the biomarkers are a panel of biomarkers and their measurable fragments: OLM4, LAD1, DPEP1, OGFR, EPHB3, PKP3, CEAM6, SERPINB5 and MUC13 proteins.
Paired CRC and AT specimens were processed for the extraction of total proteins. Equal amounts of protein samples were separated by SDS-PAGE followed by the fractionation of each lane (one sample) into 16 gel slices. The 16 gel slices were further processed for in-gel trypsin digestion to obtain 16 peptide fractions which were analyzed sequentially by LC-MS/MS on a Q-Exactive mass spectrometer equipped with a Dionex Ultimate 3000 RSLCnano system using HCD fragmentation. This resulted in 16 raw MS files from the gel lane of one specimen sample, which were grouped for a database search against the UniProtKB/Swiss-Prot human protein sequence database using SEQUEST and Percolator algorithms in the Thermo Proteome Discoverer 1.4.1 platform to generate a proteome profile. 44 proteome profiles (22 CRC and 22 AT) were generated for 22 paired samples. The relative completeness of the 44 proteome profiles were evaluated using ten groups of well-known “housekeeping” protein complexes consisting of 406 proteins (353 unique proteins and 53 isoforms) as the parameters. A score (0 to 100) was assigned based on the percentage of the 406 “housekeeping” proteins identified. The relative protein abundance in each of 44 proteome profiles was quantified by calculation of the normalized spectral abundance factor (NSAF). In order to quantitatively describe the relative abundance, the ppm (part per million) was chosen as the unit, and the 1,000,000 ppm value was assigned to each proteome profile. A ppm value at the range of 0 to 1,000,000 ppm for each identified protein in each proteome profile was calculated based on its NSAF. The average abundance of each identified protein for CRC and AT was calculated based on 22 CRC proteome profiles and 22 AT proteome profiles, respectively. The comparison between CRC and AT was performed either at a group level using average ppm values and summed values, or at an individual level using individual ppm values.
All specimens were collected from patients in the Affiliated Hospital of Nantong University (Nantong, China) in accordance with approved human subject guidelines authorized by the Medical Ethics and Human Clinical Trial Committee at the Hospital. Following surgery, the tumor and adjacent normal tissue (AT) specimens were collected in separate tubes, kept in dry ice during transportation, and stored at −80° C. before further processing. AT specimens were obtained from the distal edge of the resection at least 5 cm from the tumor. 22 pairs of cancerous and adjacent normal tissue specimens were collected from 22 individual patients (10 with lymph node metastasis and 12 without lymph node metastasis) (Table 1). All CRC patients had histologically verified adenocarcinoma of the colon or rectum that was confirmed by pathologists. Patient characteristics were obtained from pathology records. Subjects with a history of other malignant diseases or infectious disease, or who had undergone surgery 6 months prior to the start of this research were excluded for this retrospective study.
Total protein extraction from fresh frozen tissue specimens was prepared by the following method. Frozen tissue samples (0.05-0.1 gram) were cut into small pieces (1 mm size) using a clean sharp blade, and transferred into 1.5 ml tubes. A 0.4 ml lysis buffer (20 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1 mM Na2EDTA, 1 mM EGTA, 1% Triton X-100, Protease inhibitor cocktail pill) was added into each sample tube. The tissues were homogenized using a Dounce homogenizer. After Homogenization, 50 μl of 10% SDS and 50 μl of 1M DTT were added into the mixture followed by incubation at 95° C. for 10 min. After incubation the extraction was sonicated to further breakdown DNA. Sonicated mixtures were centrifuged at 15,000×g for 10 minutes. Supernatants were collected and stored at −80° C. for further analysis. The protein concentration of the supernatants was determined by a BCA™ Reducing Reagent compatible assay kit (Pierce/Thermo Scientific).
Equal amounts of protein (133 μg) from each sample were loaded onto a NuPAGE 4-12% Bis-Tris Gel (Life Technologies). After electrophoresis the gel was stained with SimplyBlue SafeStain (Life Technologies), and subsequently de-stained thoroughly. For preparing in-gel trypsin digested peptides, the de-stained gel was washed with ion-free water three times, and each lane representing one sample was sliced horizontally into 16 slices. Each slice was diced into tiny pieces (1-2 mm) and placed into 1.5 ml centrifuge tubes. Proteins in the gel were treated with DTT for reduction, then iodoacetamide for alkylation, and further digested by trypsin in 25 mM NH4HCO3 solution. The digested protein was extracted as described elsewhere. The extracted peptides were dried and reconstituted in 20 μl of 0.1% formic acid before nanospray LC/MS/MS analysis was performed.
16 tryptic peptide fractions from one specimen sample were analyzed sequentially using a Thermo Scientific Q-Exactive hybrid Quadrupole-Orbitrap Mass Spectrometer equipped with a Thermo Dionex UltiMate 3000 RSLCnano System. Tryptic peptide samples were loaded onto a peptide trap cartridge at a flow rate of 5 μL/min. The trapped peptides were eluted onto a reversed-phase 25 cm C18 PicoFrit column (New Objective, Woburn, Mass.) using a linear gradient of acetonitrile (3-36%) in 0.1% formic acid. The elution duration was 110 min at a flow rate of 0.3 μL/min. Eluted peptides from the PicoFrit column were ionized and sprayed into the mass spectrometer, using a Nanospray Flex Ion Source ES071 (Thermo) under the following settings: spray voltage, 1.6 kV, Capillary temperature, 250° C. The Q Exactive instrument was operated in the data dependent mode to automatically switch between full scan MS and MS/MS acquisition. Survey full scan MS spectra (m/z 300-2000) was acquired in the Orbitrap with 70,000 resolution (m/z 200) after accumulation of ions to a 3×106 target value based on predictive AGC from the previous full scan. Dynamic exclusion was set to 20 s. The 12 most intense multiply-charged ions (z≧2) were sequentially isolated and fragmented in the Axial Higher energy Collision-induced Dissociation (HCD) cell using normalized HCD collision energy at 25% with an AGC target 1e5 and a maxima injection time of 100 ms at 17,500 resolution.
The raw MS files were analyzed using the Thermo Proteome Discoverer 1.4.1 platform (Thermo Scientific, Bremen, Germany) for peptide identification and protein assembly. For each specimen sample, 16 raw MS files obtained from 16 sequential LC-MS analyses were grouped for a single database search against the Human UniProtKB/Swiss-Prot human protein sequence databases (20597 entries, Dec. 20, 2013) based on the SEQUEST and percolator algorithms through the Proteome Discoverer 1.4.1 platform. Carbamidomethylation of cysteines was set as a fixed modification. The minimum peptide length was specified to be five amino acids. The precursor mass tolerance was set to 15 ppm, whereas fragment mass tolerance was set to 0.05 Da. The maximum false peptide discovery rate was specified as 0.01. The resulting Proteome Discoverer Report contains all assembled proteins (a proteome profile) with peptides sequences and matched spectrum counts. 44 proteome profiles were generated for 22 paired specimen samples (22 CRCs and 22 ATs).
Protein quantification used the normalized spectral abundance factors (NSAFs) method to calculate the protein relative abundance for each identified protein in each proteome profile. In order to quantitatively describe the relative abundance, the ppm (part per million) was chosen as the unit and the 1,000,000 ppm value was assigned to each proteome profile. A ppm value at the range of 0 to 1,000,000 ppm for each identified protein in each proteome profile was calculated based on its normalized NSAF.
The ppm (part per million) was calculated as follow:
RCN=106×NSAFN
NSAFN=(SN/LN)/(Σni=1Si/Li)
Histone H4 in AT was 13041±4025 ppm, in CRC was 10903±3821 ppm, GAPDH in AT was 5473±1623 ppm, in CRC was 5932±1480 ppm, and Caspase-8 in AT was 6.1±9.6 ppm, in CRC was 14.6±10.8 ppm. The MEAN, STDEV, T-test values (p-values) were calculated using Microsoft Excel. The ratio of CRC versus AT was defined as 1000 or 0.001 if the protein was not identified in AT or in CRC respectively.
To evaluate the ppm quantification method we compared the relative protein abundance calculated based on NSAFs using ppm as the unit by this study and the published relative abundance calculated in Beck's copy number30. All subunits from four housekeeping protein complexes including the Arp⅔ complex (7 subunits plus one isoform), the COP9 complex (8 subunits plus one isoform), and the Proteasome (17 subunits) and TCA 17 enzymes were used for comparison. As shown in Extended Data
Due to the instrument limitations and wide dynamic range of protein abundances, the most current LC/MS/MS settings are unable to recover the whole proteome, especially the lowest abundance proteins in one experiment. Although it was difficult to obtain a complete proteome from one experiment it was necessary to find an effective approach to evaluate the quality and relative completeness of a set of proteome profiles generated over a period of time before these profiles could be analyzed together unbiasedly. We used the “housekeeping” protein complexes and the distribution of protein population to examine the quality of a proteome profile. It is well-known that “housekeeping” proteins and their complexes are essential for maintaining the life status of a cell, and exist in all tissue/cell types for a life-long time. Therefore, we hypothesized that if these complexes including all subunits could be quantitatively identified and showed no obvious changes between analyses, it indicated that these proteomic profile datasets were relatively complete and comparable, and that the analysis workflow was reliable. Ten groups of well-known “housekeeping” protein complexes consisting of 406 proteins, including 353 unique proteins and 53 isoforms or subtypes (Table 4), were selected as the parameters for the evaluation. The ten groups of complexes were the Arp⅔ complex (8 subunits plus alpha and beta actins), 86 (79 and 7 isoforms) cellular (60S and 40S) ribosomal proteins, 77 mitochondrial (28S and 39S) ribosomal proteins, Nuclear pore complex 38 (34 subunits including GTP-binding nuclear protein Ran, Ran GTPase-activating protein 1 (RAGP1), Ran-specific GTPase-activating protein (RANG), and Ran-binding protein 3 (RANB3), and 4 isoforms), 5 Histones (H1 (5 subtypes), H2A (8 subtypes), H2B (3 subtypes), H3 (4 subtypes) and H4), Proteasome complex (17 subunits), COP9 signalosome complex (9 subunits), TCA enzymes (17 key enzymes), Mitochondrial respiratory chain complexes I-V (94 subunits), and V-type proton (ATPase Complex, 14 subunits consisting 24 isoforms), and Na+/K+-ATPase (sodium-potassium pump, 2 subunits, 7 isoforms). A score (0 to 100) was assigned based on the percentage of the 406 “housekeeping” proteins identified. 44 proteome profiles from this study were scored at an average 92 suggesting these profiles were at the same level of completeness. Three unique proteins, 80S ribosomal protein L41, V-type proton ATPase 21 kDa proteolipid subunit, and V-type proton ATPase subunit e1 or e2, were not identified in this study. To demonstrate the feasibility of this evaluation method we assessed two sets of publically available MS raw data files (http://proteomics.cancer.gov/). One set of 94 MS raw data files (94 CRC samples) from the TCGA-CRC cancer program were scored at an average of 80.3; Another set of 12 MS raw data files from TCGA-Breast cancer program were scored at an average of 98.5.
We next assessed the quality of a proteome profile based on the distribution of its protein population. The distribution of identified proteins per concentration range was analyzed using the Excel-histogram function. The average abundance for each identified protein was calculated as described above. The distribution of all identified 12380 proteins displayed a normal distribution with a major peak and a minor peak representing two populations. The major peak represented 62% (CRC) and 60% (AT) of identified proteins with a relative abundance more than 1 ppm, and the minor peak represented about 38% (CRC) and 40% (AT) of identified proteins with an abundance less than 1 ppm. The majority proteins in the minor peak were randomly identified with one or few PSM across 22 CRC samples or 22 AT samples. To evaluate the method 94 sets of MS raw profiles for 94 CRC samples from the TCGA-CRC cancer program, 12 sets of MS raw datasets from TCGA-Breast cancer program were analyzed. The distributions of identified proteins in 94 TCGA-CRC data files and 12 TCGA-Breast data files were normal distribution and showed the same distribution patterns.
Considering the practical reality, a small panel of protein biomarkers would have more advantages. We identified a panel of 11 proteins based on the relative abundance (mean abundance in CRC >20 ppm) from the ranked 740 proteins to distinguish cancer tissues from normal colorectal tissues obviously (
The cell functions are executed and regulated by the entire sets of proteins (the proteome). The regulation of different cellular functions have been categorized into a number of pathways such as the Wnt signaling pathway and the TGF signaling pathway. In each pathway, the components according to their function are generally named as ligands, receptors, activating regulators, inhibitory regulators, and effectors. In order to measure the activation strength of a pathway, the protein molecules that belong to either ligands, receptors, activating regulators, or inhibitory regulators were grouped and their relative abundances (ppm) were summed. Based on the summed abundance of each grouped components, the activation strength or activation status of a pathway could be compared between two proteome profiles. The proteins list for all analyzed pathways and processes were obtained from the KEGG pathway database and their functional annotation were manually confirmed using the UniProtKB protein database and the NCBI protein database or available publications.
Specimens were mounted in paraffin and cut into 8 μm sections. The paraffin sections were treated with xylene and rehydrated. After antigen retrieval, endogenous peroxidase activity was quenched for 30 minutes with 3% H2O2 at room temperature. Nonspecific binding sites were blocked by incubation in normal goat serum for 30 minutes at room temperature. Sections were then incubated over-night at 4° C. with primary polyclonal antibodies (10-1000 dilution) including anti-OLFM4, Plakophilin-3, anti-CEAM6, anti-MUC13, anti-CEA, anti-EPH receptor B3, anti-Chymase, anti-CPA3, anti-LAD1, anti-SerpinB5, anti-DPEP1, and anti-OGFR. After the sections were rinsed, a secondary antibody detection Reagent (MaxVisionTM2 kit, Maixin Scientific, China) was incubated at room temperature for 30 minutes. The bound antibody complexes were stained for 5 to 20 minutes with Diaminobenzidine (DAB) and then counterstained with Hematoxylin. Slides were photographed with an Olympus photomicroscope. The results are showed in
For Western blot analysis equal amount of samples from paired CRC and AT were resolved by 4-12% LDS-NuPAGE gels, transferred to nitrocellulose membranes, and analyzed by western blot (WB) with antibody (100-5000 dilution) selecting from the group consisting of anti-OLFM4, Plakophilin-3, anti-CEAM6, anti-MUC13, anti-CEA, anti-EPH receptor B3, anti-Chymase, anti-CPA3, anti-LAD1, anti-SerpinB5, anti-DPEP1, and anti-OGFR using enhanced chemiluminescence (ECL; Amersham, Piscataway, N.J.). The results are showed in
Biomarkers' concentrations in plasma/serum specimen were determined using an enzyme-linked immunosorbent assay (ELISA). The samples were analyzed in triplicate and the mean concentrations were calculated. The samples were transferred to 96-well plates coated with primary antibodies (100-5000 dilution) consisting of anti-OLFM4, Plakophilin-3, anti-CEAM6, anti-MUC13, anti-CEA, anti-EPH receptor B3, anti-Chymase, anti-CPA3, anti-LAD1, anti-SerpinB5, anti-DPEP1, and anti-OGFR. Plates were incubated in cold room for 3 hr, after which plates were washed with PBS buffer using an automated plate washer. Luminescence in each well was measured with an Envision plate reader using Gaussia FLEX luciferase kit (New England Biolabs). After luminescence measurement, HRP-conjugated secondary antibody in ELISA buffer (1×PBS, 2% goat serum, 5% Tween 20) was added to wells. Plates were washed in 1×PBS/0.05% Tween 20 with a plate washer and ELISA signal was detected with 3,3′,5,5′-tetramethylbenzidine (TM B) substrate.
Biomarkers' concentrations in plasma/serum specimen were determined using an immunoprecipitation assisted MS assay. The samples were analyzed in triplicate and the mean concentrations were calculated. The samples were transferred to 1.5 mL tubes with 11 primary antibodies (10-1000 dilution) consisting of anti-OLFM4, Plakophilin-3, anti-CEAM6, anti-MUC13, anti-CEA, anti-EPH receptor B3, anti-Chymase, anti-CPA3, anti-LAD1, anti-SerpinB5, anti-DPEP1, and anti-OGFR which are immobilized on protein G agarose beads/magnetic beads. The reaction mixtures were incubated in cold room for 3 hour to overnight. After incubation the protein G agarose beads conjugated with 11 antibodies were collected by centrifugation and were washed with 1×PBS/0.05% Tween 20. All protein bounded on the protein G agarose beads were quantified by a mass spectrometer.
Total protein extraction from fresh frozen tissue specimens was prepared by the following method. Frozen tissue samples (0.05-0.1 gram) were cut into small pieces (1 mm size) using a clean sharp blade, and transferred into 1.5 ml tubes. A 0.4 ml lysis buffer (20 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1 mM Na2EDTA, 1 mM EGTA, 1% Triton X-100, Protease inhibitor cocktail pill) was added into each sample tube. The tissues were homogenized using a Dounce homogenizer. After Homogenization, 50 μl of 10% SDS and 50 μl of 1M DTT were added into the mixture followed by incubation at 95° C. for 10 min. After incubation the extraction was sonicated to further breakdown DNA. Sonicated mixtures were centrifuged at 15,000×g for 10 minutes. Supernatants were collected and stored at −80° C. for further analysis. The protein concentration of the supernatants was determined by a BCA™ Reducing Reagent compatible assay kit (Pierce/Thermo Scientific). The expression levels of CAM1, CPA3, OLM4, LAD1, DPEP1, OGFR, EPHB3, PKP3, SERPINB5 and MUC13 proteins were determined by Mass Spectrometry. Or the amounts of CAM1, CPA3, OLM4, LAD1, DPEP1, OGFR, EPHB3, PKP3, SERPINB5 and MUC13 protein were used, the interaction between protein and its antibody was used as standard to determine the concentrations of biomarkers in the lysates from the normal tissues or colorectal tumors.
All of proteins in table 2 can be used as biomarkers for colorectal cancer. Any of the 740 proteins can be developed to a method useful or diagnostic kit for determining if a subject has an increased risk having a colorectal disease or disorder as disclosed in the present application. Applicant will claim the patent right for any patent resulting from the instant application, any continuations, divisions, re-issues, re-examinations and extensions thereof and corresponding patents and patent applications in other countries. Furthermore, all of biomarkers for colorectal cancer can be used for other tumors, such as bladder cancer, breast cancer, endometrial cancer, kidney cancer, colon cancer, leukemia, lung cancer, melanoma, non-Hodgkin lymphoma, pancreatic cancer, prostate cancer and thyroid cancer.
This application claims priority to U.S. patent application Ser. No. 62/105,642 filed 20 Jan. 2015.
Number | Date | Country | |
---|---|---|---|
62105642 | Jan 2015 | US |