The disclosure relates to cancer biomarkers and more particularly to tissue specific serum cancer biomarkers and methods and uses thereof.
Serological biomarkers represent a non-invasive and cost-effective means to aid in clinical management of cancer patients, particularly in areas of disease detection, prognosis, monitoring and therapeutic stratification. For a serological biomarker to be useful for early detection, its presence in serum must be relatively low in healthy individuals and those with benign disease. The marker must be produced by the tumor or its microenvironment and enter circulation, giving rise to increased serum levels. Mechanisms that facilitate entry to circulation include secretion or shedding, angiogenesis, invasion, and destruction of tissue architecture [1]. The biomarker should preferably be tissue specific, such that a change in serum level can be directly attributed to disease (e.g., cancer) of that tissue [2]. The currently most widely-used serological biomarkers include carcinoembryonic antigen (CEA) and carbohydrate antigen 19.9 (CA19.9) for gastrointestinal cancer [3-5], CEA, CYFRA 21-1 (cytokeratin 19 fragment), neuron-specific enolase (NSE), tissue polypeptide antigen (TPA), progastrin-releasing peptide (pro-GRP), and SCC antigen for lung cancer [6], CA 125 for ovarian cancer [2], and prostate-specific antigen (PSA, also known as KLK3) in prostate cancer [7]. These current serological biomarkers lack the appropriate sensitivity and specificity to be suitable for early cancer detection.
An example of Serum PSA is commonly used for prostate cancer screening in men over 50, but its usage remains controversial due to serum elevation in benign disease as well as prostate cancer [8]. Nevertheless, PSA represents one of the most useful serological markers currently available. PSA is strongly expressed in only the prostate tissue of healthy men, with low levels in serum established by normal diffusion through various anatomical barriers. These anatomical barriers are disrupted upon development of prostate cancer, allowing increased amounts of PSA to enter circulation [1].
In an aspect, the disclosure includes a method of evaluating a probability a subject has a cancer and/or diagnosing the subject with cancer, the method comprising:
In another aspect, the disclosure includes a method of monitoring cancer progression, the method comprising:
In an embodiment, the biomarkers comprise CUZD1 and/or LAMC2.
In yet another aspect, the disclosure includes a method of monitoring pancreatic cancer progression, the method comprising:
In a further aspect, the disclosure includes a method of validating a candidate biomarker as a cancer biomarker comprising:
In an embodiment the test sample is a biological fluid.
In another embodiment the biological fluid is blood or a fraction thereof selected from serum and plasma.
In an embodiment the biomarkers is selected from CEACAM7, CLCA1, GPA33, LEFTY1 and/or ZG16.
In an embodiment the biomarker is selected from IRX5, LAMP3, MFAP4, SCGB1A1, SFTPC, and/or TMEM100.
In a further embodiment the biomarker is selected from AQP8, CELA2B, CELA3B, CTRB1, CTRB2, CUZD1, GCG, IAPP, INS, KLK1, PNLIPRP1, PNLIPRP2, PPY, PRSS3, REG3G, SLC30A8, DSP, LAMC2, GP73 and/or DSG2.
In yet another embodiment the biomarker is selected from NPY, PSCA, RLN1 and SLC45A3.
In an embodiment the control is a cut-off for associated with a specificity and sensitivity and the specificity is selected to be at least 65%, at least 70%, at least 75%, at least 80%, at least 85% or at least 90%.
In an embodiment the sensitivity is selected to be at least 65%, at least 70%, at least 75%, at least 80%, at least 85% or at least 90%.
In another embodiment the amount of CUZD1 indicative the subject has pancreatic cancer or an increased probability of pancreatic cancer is greater than about 2 ng/ml, 2.2 ng/ml, 2.4 ng/ml, 2.6 ng/ml, 2.8 ng/ml, 3 ng/ml, 3.1 ng/ml, 3.2 ng/ml, 3.4 ng/ml, 3.6 ng/ml, 3.8 ng/ml, 4 ng/ml, 4.2 ng/ml, 4.4 ng/ml, 4.6 ng/ml, 4.8 ng/ml, 5 ng/ml.
In an embodiment the amount of LAMC2 indicative the subject has pancreatic cancer or an increased probability of pancreatic cancer is greater than about 100 ng/ml, 120 ng/ml, 140 ng/ml. 160 ng/ml, 170 ng/ml, 180 ng/ml, 200 ng/ml, 220 ng/ml, 240 ng/ml, 260 ng/ml, 280 ng/ml, 300 ng/ml, 320 ng/ml, 340 ng/ml, 360 ng/ml, 380 ng/ml or 400 ng/ml.
In an embodiment, the method further compress measuring the amount of an additional biomarker in the sample.
In a further embodiment the additional biomarker is selected from CA19.9 CEA, CYFRA-21-1 NSE TPA, proGRP, SCC, CA125 and PSA.
In an embodiment the additional biomarker is CA19.9
In an embodiment the biomarker is CUZD1, LAMC2 and/or DSG2 and the additional biomarker is CA19.9.
In another embodiment the measuring comprises an antibody based immunoassay.
In an embodiment the immunoassay is an ELISA.
In an aspect, this disclosure includes the use of a biomarker selected from the group consisting of CUZD1 and/or LAMC2 and/or the group consisting of CEACAM7, CLCA1, GPA33, LEFTY1, ZG16, IRX5, LAMP3, MFAP4, SCGB1A1, SFTPC, TMEM100, AQP8, CELA2B, CELA3B, CTRB1, CTRB2, CUZD1, GCG, IAPP, INS, KLK1, PNLIPRP1, PNLIPRP2, PPY, PRSS3, REG3G, SLC30A8, KLK3, NPY, PSCA, RLN1, SLC45A3, DSP, LAMC2, GP73 and/or DSG2 for evaluating if a subject has cancer according to the method described herein.
In another aspect, the disclosure includes a method of validating a candidate biomarker as a soluble tissue specific cancer biomarker comprising:
In an embodiment the biological fluid is selected from ascites, seminal plasma, peritoneal fluid, pancreatic juice and/or saliva.
In another embodiment 2, 3, 4, 5, 6, 7 or more biomarkers are measured.
In a further embodiment the biomarkers comprise CUZD1, LAMC2 and CA19.9.
In an aspect, the disclosure includes a kit comprising:
In an embodiment two or more antibodies, optionally coupled to a solid surface.
In another embodiment the two or more antibodies comprise an antibody specific for CUDZ1 and an antibody specific for CA19.9.
In an embodiment the kit for use in the method described herein.
In an embodiment, the biomarker is CUZD1.
In an embodiment, the biomarker is LAMC2.
In an embodiment, the biomarker is selected from DSP and GP73
Other features and advantages of the present disclosure will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples while indicating preferred embodiments of the disclosure are given by way of illustration only, since various changes and modifications within the spirit and scope of the disclosure will become apparent to those skilled in the art from this detailed description.
An embodiment of the disclosure will now be described in relation to the drawings in which:
Abbreviations used include: CEA, carcinoembryonic antigen; CA19.9, carbohydrate antigen 19.9; CYFRA 21-1, cytokeratin 19 fragment; NSE, neuron-specific enolase; TPA, tissue polypeptide antigen; pro-GRP, progastrin-releasing peptide; PSA, prostate-specific antigen; TiGER, Tissue-specific and Gene Expression and Regulation; ESTs, expressed sequence tags; HPA, Human Protein Atlas; IHC, immunohistochemistry; MeSH, Medical Subject Headings; CLCA4, chloride channel accessory 4; SFPTA2, surfactant protein A2; PNLIP, pancreatic lipase; KLK3, kallikrein-related peptidase 3 The full names of biomarkers are found in the Tables, and the associated sequences as indicated by the provided accession numbers, incorporated herein by reference.
The term “antibody” as used herein is intended to include monoclonal antibodies, polyclonal antibodies, and chimeric antibodies. The antibody may be from recombinant sources and/or produced in transgenic animals.
The term “antibody binding fragment” as used herein is intended to include Fab, Fab′, F(ab′)2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, and multimers thereof and bispecific antibody fragments. Antibodies can be fragmented using conventional techniques. For example, F(ab′)2 fragments can be generated by treating the antibody with pepsin. The resulting F(ab′)2 fragment can be treated to reduce disulfide bridges to produce Fab′ fragments. Papain digestion can lead to the formation of Fab fragments. Fab, Fab′ and F(ab′)2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, bispecific antibody fragments and other fragments can also be synthesized by recombinant techniques.
Antibodies may be monospecific, bispecific, trispecific or of greater multispecificity. Multispecific antibodies may immunospecifically bind to different epitopes of a NADPH oxidase polypeptide and/or or a solid support material. Antibodies may be from any animal origin including birds and mammals (e.g., human, murine, donkey, sheep, rabbit, goat, guinea pig, camel, horse, or chicken).
Antibodies may be prepared using methods known to those skilled in the art. Isolated native or recombinant polypeptides may be utilized to prepare antibodies. See, for example, Kohler et al. (1975) Nature 256:495-497; Kozbor et al. (1985) J. Immunol Methods 81:31-42; Cote et al. (1983) Proc Natl Acad Sci 80:2026-2030; and Cole et al. (1984) Mol Cell Biol 62:109-120 for the preparation of monoclonal antibodies; Huse et al. (1989) Science 246:1275-1281 for the preparation of monoclonal Fab fragments; and, Pound (1998) Immunochemical Protocols, Humana Press, Totowa, N.J. for the preparation of phagemid or B-lymphocyte immunoglobulin libraries to identify antibodies.
In aspects, the antibody is a purified or isolated antibody. By “purified” or “isolated” is meant that a given antibody or fragment thereof, whether one that has been removed from nature (isolated from blood serum) or synthesized (produced by recombinant means), has been increased in purity, wherein “purity” is a relative term, not “absolute purity.” In particular aspects, a purified antibody is 60% free, preferably at least 75% free, and more preferably at least 90% free from other components with which it is naturally associated or associated following synthesis.
The term “biomarker” or “biomarker of the disclosure” as used herein means a biomarker listed in Table 4 and/or 11 and/or the subset listed in Tables 5, 6, 7, 8 and/or 11, fragments and naturally occurring variants thereof. The biomarker can be for example used to aid in the evaluation of the presence of a cancer of a specific tissue type. For example, Table 5 lists proteins that are specific to colon tissue and they may represent colon cancer specific biomarkers; Table 6 lists proteins that are specific to lung tissue and they may represent lung cancer specific biomarkers; Table 7 and 11 list proteins that are specific to pancreas tissue and they may represent pancreas cancer specific biomarkers, for example as shown for CUZD1, LAMC2 and DSG2; Table 8 lists proteins that are specific to prostate tissue and they may represent prostate cancer specific biomarkers.
The term “CUZD1” as used herein refers to “CUB and zona pellucid-like domain-containing protein 1” which is also referred to a UO-44. The gene is located on chromosome 10q26.13 and encodes a 607 amino acid transmembrane protein. CZUD1 includes without limitation, all known CUZD1 molecules, including human, naturally occurring variants and those deposited in Genbank, for example, with accession number Q86UP6 and/or NP—071317, and Swiss-Prot ID of Q86UP6, each of which is herein incorporated by reference.
The term “LAMC2” as used herein refers to laminin, gamma C2 and includes without limitation all known LAMC2 molecules, including human, naturally occurring variants and those deposited in publically available databases with different accession numbers, such as HGNC—64931, Entrez Gene—39182,Ensembl_ENSG000000580857,OMIM—1502925,UniProtKB_Q137533 each of which is herein incorporated by reference.
The term “additional biomarker” as used herein means a biomarker not listed in Table 5, 6, 7, 8 or 11 and includes biomarkers used in clinic for example CA19.9 CEA, CYFRA-21-1 NSE TPA, proGRP, SCC, CA125 and PSA. Other additional biomarkers include for example, biomarkers listed in Table 4 as previously studied, for example SFTPA2, SFTPB, SFTPD, CEL, CELA2A, CPA1, CPA2, CPB1, PNLIP, PRSS1, SYCN, ACPP, FOLH1, KLK2 and/or KLK3.
The phrase “biomarker polypeptide”, “polypeptide biomarker” or “polypeptide product of a biomarker” refers to a proteinaceous biomarker gene product for example of a biomarker listed in Table 4 and/or 11.
The phrase “biomarker nucleic acid”, or “nucleic acid product of a biomarker” refers to a polynucleotide biomarker gene product of a biomarker for example a biomarker listed in tables 4 and/or 11.
The term “biomarker specific reagent” as used herein refers to a reagent that is a highly sensitive and specific, for example exhibiting at least 2×, at least 3×, at least 4× at least 5 or at least 10× greater specificity for its cognate antigen compared to another antigen, for quantifying levels of a biomarker expression product, for example a polypeptide biomarker level or a nucleic acid biomarker product and can include antibodies which can for example be used with immunohistochemistry (IHC), ELISA and protein microarray or polynucleotides such as primers and probes which can for example be used with quantitative RT-PCR techniques, to detect the expression level of a biomarker associated with a cancer.
The term “control” as used herein refers to any sample or samples from a subject without cancer or not having the cancer being tested, of a similar type to the test sample which can be used for measuring control biomarker expression levels and/or predetermined value or reference standard which corresponds to and/or is derived from biomarker levels expressed for example as a numerical value (e.g. cut-off) corresponding to the biomarker levels in such a control sample or samples. For example the control can be an average, median, normalized level or cut-off value (e.g. threshold) for a biomarker above or below which a subject can be classified as likely having or not having a cancer.
The cut-off or threshold can for example be a median level or value comprising the median expression level or levels in a population of subjects, e.g. below which are likely not to have cancer and above which are likely to have cancer. For example following a clinical study which can be similar to the study described in Example 2 or Example 8, a cut-off or threshold can be determined to optimize the trade-off between false negative and false positive discoveries, for example by optimizing the area under the ROC curve. The optimized threshold will for example vary with the number of biomarkers being assessed (e.g. CUZD1 vs CUZD1 and CA19.9) The threshold(s) may be set at a desired sensitivity or specificity and/or to correspond to a selected level based on the study sample, for example corresponding to the lowest 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20% or 10% of in a population of subjects. The expression levels compared, can be normalized levels wherein the expression level for example in the test sample is compared to an internal standard and used to calculate a ratio. For example an internal standard is a non-biomarker gene (transcript or protein) that is suitable for comparison (e.g. expected to be expressed at relatively the same level in different samples) that is used to quantify the relative amount of biomarker transcript for comparison purposes. The ratio is then compared to a similar ratio in a control sample and/or a predetermined ratio corresponding to control samples.
As an example, an optimized cutoff for each marker can be obtained by minimizing the total prediction error, using for example the following formula: √{square root over ((1−sensitivity)2+(1−specificity)2)}{square root over ((1−sensitivity)2+(1−specificity)2)}. Cutoffs can be chosen based on the shortest distance of the ROC curve to the top-left corner. Multi-parametric models for combinations of markers can be used to obtain estimated coefficients. The estimated coefficients of the model can be used to construct a combined score for each observation which is then used for the evaluation of the multi-parametric model. Typically, both a training and a validation set of samples is used. Analysis of the results from the training dataset can identify the optimized cut-offs that are subsequently verified in a validation set.
The term “measuring an expression level” as used in reference to a biomarker means the application of a biomarker specific reagent such as a probe, primer or antibody and/or a method to a sample, for example a sample of the subject and/or a control sample, for ascertaining or measuring quantitatively, semi-quantitatively or qualitatively the amount of a biomarker or biomarkers, for example the amount of biomarker polypeptide or mRNA. For example, a level of a biomarker can be determined by a number of methods including for example immunoassays including for example immunohistochemistry, ELISA, Western blot, immunoprecipation and the like, where a biomarker detection agent such as an antibody for example, a labeled antibody, specifically binds the biomarker and permits for example relative or absolute ascertaining of the amount of polypeptide biomarker, hybridization and PCR protocols where a probe or primer or primer set are used to ascertain the amount of nucleic acid biomarker, including for example probe based and amplification based methods including for example microarray analysis, RT-PCR such as quantitative RT-PCR, serial analysis of gene expression (SAGE), Northern Blot, digital molecular barcoding technology, for example Nanostring nCounter™ Analysis, and TaqMan quantitative PCR assays. Other methods of mRNA detection and quantification can be applied, such as mRNA in situ hybridization in formalin-fixed, paraffin-embedded (FFPE) tissue samples or cells. This technology is currently offered by the QuantiGene® ViewRNA (Affymetrix), which uses probe sets for each mRNA that bind specifically to an amplification system to amplify the hybridization signals; these amplified signals can be visualized using a standard fluorescence microscope or imaging system. This system for example can detect and measure transcript levels in heterogeneous samples; for example, if a sample has normal and tumor cells present in the same tissue section. As mentioned, TaqMan probe-based gene expression analysis (PCR-based) can also be used for measuring gene expression levels in tissue samples, and for example for measuring mRNA levels in FFPE samples. In brief, TaqMan probe-based assays utilize a probe that hybridizes specifically to the mRNA target. This probe contains a quencher dye and a reporter dye (fluorescent molecule) attached to each end, and fluorescence is emitted only when specific hybridization to the mRNA target occurs. During the amplification step, the exonuclease activity of the polymerase enzyme causes the quencher and the reporter dyes to be detached from the probe, and fluorescence emission can occur. This fluorescence emission is recorded and signals are measured by a detection system; these signal intensities are used to calculate the abundance of a given transcript (gene expression) in a sample.
The term “difference in the level” as used herein in comparison to a control refers to a measurable difference in the level or quantity of a biomarker or biomarkers associated in a test sample, compared to the control that is of sufficient magnitude to allow assessment of predicted outcome, for example a significant difference or a statistically significant difference. The magnitude of the difference is sufficient for example to determine that the subject falls within a class of subjects likely to have disease and/or not have disease. For example, a difference in a level of biomarker level is detected if a ratio of the level in a test sample as compared with a control is greater than 1.5 for example, a ratio of greater than 1.7, 2, 3, 3, 5, 10, 12, 15, 20 or more.
The term “digital molecular barcoding technology” as used herein refers to a digital technology that is based on direct multiplexed measurement of gene expression that utilizes color-coded molecular barcodes, and can include for example Nanostring nCounter™. For example, in such a method each color-coded barcode is attached to a target-specific probe, for example about 50 bases to about 100 bases or any number between 50 and 100 in length that hybridizes to a gene of interest. Two probes are used to hybridize to mRNA transcripts of interest: a reporter probe that carries the color signal and a capture probe that allows the probe-target complex to be immobilized for data collection. Once the probes are hybridized, excess probes are removed and detected. For example, probe-target complexes can be immobilized on a substrate for data collection, for example an nCounter™ Cartridge and analysed for example in a Digital Analyzer such that for example color codes are counted and tabulated for each target molecule.
The term “expression level” as used herein in reference to a biomarker refers to a quantity of biomarker that is detectable or measurable in a sample and/or control. The quantity is for example a quantity of polypeptide, or a quantity of nucleic acid e.g. biomarker transcript. Accordingly, a polypeptide expression level refers to a quantity of biomarker polypeptide that is detectable or measurable in a sample and a nucleic acid expression level refers to a quantity of biomarker nucleic acid that is detectable or measurable in a sample.
The term “hybridize” or “hybridizable” refers to the sequence specific non-covalent binding interaction with a complementary nucleic acid. In a preferred embodiment, the hybridization is under high stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, hybridization in 6.0× sodium chloride/sodium citrate (SSC) at about 45° C., followed by a wash of 2.0×SSC at 50° C. may be employed.
The term “kit standard” as used herein means a suitable assay standard useful when determining an expression level of a biomarker associated with a cancer disclosed herein. For example, for kits for determining polypeptide biomarker levels, the kit standard optionally comprises a biomarker polypeptide (or peptide fragment) that can for example be used to prepare a standard curve or act as a positive antibody control. Alternatively, the kit standard is an antibody to a non-biomarker polypeptide such as actin for determining relative biomarker levels. For kits for detecting RNA levels for example by hybridization, the kit standard can comprise an oligonucleotide control, useful for example for internal normalization such as GAPDH for standardizing the amount of RNA in the sample and determining relative biomarker transcript levels. The kit standard can also comprise one or more known oligonucleotides that can be used to detect transcript levels of normalization genes, for example, one or more housekeeping genes, for example, genes with approximate constant expression across samples.
The term “primer” as used herein refers to a polynucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used. A primer typically contains 15-25 or more nucleotides, although it can contain less. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art.
The term “polynucleotide”, “nucleic acid” and/or “oligonucleotide” as used herein refers to a sequence of nucleotide or nucleoside monomers consisting of naturally occurring bases, sugars, and intersugar (backbone) linkages, and is intended to include DNA and RNA which can be either double stranded or single stranded, represent the sense or antisense strand.
The term “probe” as used herein refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence. In one example, the probe hybridizes to a biomarker RNA or a nucleic acid sequence complementary to the biomarker RNA. The length of probe depends for example, on the hybridization conditions and the sequences of the probe and nucleic acid target sequence. The probe can be for example, at least 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length.
A person skilled in the art would recognize that “all or part of” a particular probe or primer can be used as long as the portion is sufficient for example in the case a probe, to specifically hybridize to the intended target and in the case of a primer, sufficient to prime amplification of the intended template.
The term “sample” as used herein refers to any biological fluid, or tissue or fraction thereof (e.g. tissue extract, membrane extract, cytosolic extract, plasma or serum in the case of blood) from a subject that can be assessed for biomarker expression products, polypeptide expression products or nucleic acid expression products, including for example an isolated RNA fraction, optionally mRNA for nucleic acid biomarker determinations and a protein fraction for polypeptide biomarker determinations, and includes for example fresh tissue, frozen cells/tissue and fixed cells/tissue including formalin fixed, paraffin embedded (FFPE) samples. The sample can for example be a test sample which is a patient sample to be tested or a control sample which is a sample (or plurality of samples) with known outcome used for comparison. The biological fluid can for example be a blood fraction such as serum or blood (e.g. in the case of pancreas, colon, lung and prostate). Alternatively, the biological fluid can comprise ascites (e.g. in the case of pancreas, lung and colon), seminal plasma (e.g. in the case of prostate cancer), periotenal fluid (e.g. in the case of pancreas, lung and colon), pancreatic juice (e.g. in the case of pancreas), and saliva (in the case of lung cancer).
The term “sequence identity” as used herein refers to the percentage of sequence identity between two or more polypeptide sequences or two or more nucleic acid sequences that have identity or a percent identity for example about 70% identity, 80% identity, 90% identity, 95% identity, 98% identity, 99% identity or higher identity or a specified region. To determine the percent identity of two or more amino acid sequences or of two or more nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second amino acid or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity=number of identical overlapping positions/total number of positions.times.100%). In one embodiment, the two sequences are the same length. The determination of percent identity between two sequences can also be accomplished using a mathematical algorithm. A preferred, non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. U.S.A. 87:2264-2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. U.S.A. 90:5873-5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403. BLAST nucleotide searches can be performed with the NBLAST nucleotide program parameters set, e.g., for score=100, wordlength=12 to obtain nucleotide sequences homologous to a nucleic acid molecules of the present application. BLAST protein searches can be performed with the XBLAST program parameters set, e.g., to score-50, wordlength=3 to obtain amino acid sequences homologous to a protein molecule of the present invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., 1997, Nucleic Acids Res. 25:3389-3402. Alternatively, PSI-BLAST can be used to perform an iterated search which detects distant relationships between molecules (Id.). When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., of XBLAST and NBLAST) can be used (see, e.g., the NCBI website). The percent identity between two sequences can be determined using techniques similar to those described above, with or without allowing gaps. In calculating percent identity, typically only exact matches are counted.
The term “specifically binds” as used herein refers to a binding reaction that is determinative of the presence of the biomarker (e.g. polypeptide or nucleic acid) often in a heterogeneous population of macromolecules. For example, when the biomarker specific reagent is an antibody, specifically binds refers to the specified antibody binding with greater affinity to the cognate antigenic determinant than to another antigenic determinant, for example binds with at least 2, at least 3, at least 5, or at least 10 times greater specificity; and when a probe, specifically binds refers to the specified probe under hybridization conditions binds to a particular gene sequence at least 1.5, at least 2 at least 3, or at least 5 times background.
The term “soluble biomarker” as used herein refers to a polypeptide biomarker gene expression product or fragment thereof that is detectable in a biological fluid such as ascites or blood or a fraction thereof, such as serum or plasma. For example, a soluble biomarker includes a polypeptide that is secreted, released, or shed from a cell and detectable in for example serum.
The term “subject” as used herein refers to any member of the animal kingdom, preferably a human being.
The phrase “therapy” or “treatment” as used herein, refers to an approach aimed at obtaining beneficial or desired results, including clinical results and includes medical procedures and applications including for example chemotherapy, pharmaceutical interventions, surgery, radiotherapy and naturopathic interventions as well as test treatments for treating cancer. Beneficial or desired clinical results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions, diminishment of extent of disease, stabilized (i.e. not worsening) state of disease, preventing spread of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, and remission (whether partial or total), whether detectable or undetectable. “Treatment” can also mean prolonging survival as compared to expected survival if not receiving treatment.
The term “tissue specific” as used herein means that it is predominantly expressed in a single tissue or related tissue, for example expressed at a level of at least 2 fold, at least 4 fold, at least 6 fold or at least 10 fold greater compared to an unrelated tissue (e.g. from a different organ, of a different origin and/or comprising different cell types, e.g. epithelial, mesenchymal etc). As demonstrated in the Examples, proteins considered tissue specific were typically expressed in less than 20% of tissues examined. For each tissue, proteins with expression profiles showing similar values of expression in, or strong expression in, more than the selected tissue were eliminated (strong expression is defined as ≧10 times the median expression value in all tissues (e.g. more than 3, more than 4 or more than 5 tissues). Moreover, for each tissue, proteins with high/strong expression in the selected tissue and medium/moderate expression (e.g. less than a 2 fold increase) in more than two other tissues were also eliminated.
The term “Resectable cancer” as used herein comprises a subset of cancers that are typically early stage cancer that can be surgically excised. Stage can be used as a proxy for example in terms of pancreatic cancer, Stages IA, IB and IIA Pancreatic Cancer are typically resectable and in the examples are used as a proxy for resectable pancreatic cancer samples. The term “Maybe Resectable” in relation to pancreatic cancer is understood to typically include for example Stage IIB Pancreatic Cancer. Typically the term “Non-resectable” is associated with stage III and IV Pancreatic Cancer.
The term “early stage cancer” as used herein means cancer prior to metastasis and/or organ extravasion. For example with respect to pancreatic cancer, early stage cancer comprises stages IA, IB and IIA.
The term “CA19-9 negative patients” as used herein refer to subjects who have a CA19-9 level that is less than 37 IU/mL and/or individuals who are Lewisa-b-, which is about 5-10% of the Caucasian population. In this population CA19-9 is not appreciably expressed even in those with advanced disease.
In understanding the scope of the present disclosure, the term “comprising” and its derivatives, as used herein, are intended to be open ended terms that specify the presence of the stated features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps. The foregoing also applies to words having similar meanings such as the terms, “including”, “having” and their derivatives. Finally, terms of degree such as “substantially”, “about” and “approximately” as used herein mean a reasonable amount of deviation of the modified term such that the end result is not significantly changed. These terms of degree should be construed as including a deviation of at least ±5% of the modified term if this deviation would not negate the meaning of the word it modifies.
In understanding the scope of the present disclosure, the term “consisting” and its derivatives, as used herein, are intended to be close ended terms that specify the presence of stated features, elements, components, groups, integers, and/or steps, and also exclude the presence of other unstated features, elements, components, groups, integers and/or steps.
The recitation of numerical ranges by endpoints herein includes all numbers and fractions subsumed within that range (e.g. 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, and 5). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term “about.” Further, it is to be understood that “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “about” means plus or minus 0.1 to 50%, 5-50%, or 10-40%, preferably 10-20%, more preferably 10% or 15%, of the number to which reference is being made.
Further, the definitions and embodiments described are intended to be applicable to other embodiments herein described for which they are suitable as would be understood by a person skilled in the art. For example, in the above passages, different aspects of the invention are defined in more detail. Each aspect so defined can be combined with any other aspect or aspects unless clearly indicated to the contrary. In particular, any feature indicated as being preferred or advantageous can be combined with any other feature or features indicated as being preferred or advantageous.
Recent advances in high-throughput technologies (e.g. high-content microarray chips, serial analysis of gene expression, expressed sequence tags) have enabled the creation of publicly available gene and protein databases that describe the expression of thousands of genes and proteins in multiple tissues. Five gene databases and one protein database were utilized herein to identify tissue specific biomarkers. The C-It [9, 10], Tissue-specific and Gene Expression and Regulation (TiGER) [11, 12], and UniGene [13, 14] databases are based on expressed sequence tags (ESTs). The BioGPS [15-17] and VeryGene [18, 19] databases are based on microarray data. The Human Protein Atlas (HPA) [20, 21] are based on immunohistochemistry (IHC) data.
Diamandis et al. have previously characterized the proteomes of conditioned media (CM) from 44 cancer cell lines and three near normal cell lines and 11 relevant biological fluids (e.g., pancreatic juice and ascites) using multi-dimensional liquid chromatography tandem mass spectrometry, identifying between 1000-4000 proteins per cancer site [22-33, unpublished data].
Numerous candidate biomarkers have been identified from in silico mining of gene-expression profiling [34-36] and the HPA [37-48]. Described herein is a strategy to identify tissue-specific proteins using publicly available gene and protein databases. The strategy mines databases for proteins highly specific to or strongly expressed in one tissue, selects proteins, which are secreted or shed, and integrates proteomic datasets enriched for the cancer secretome to prioritize candidates for further verification and validation studies. Integrating and comparing proteins identified from databases based on different data sources (ESTs, microarray, and IHC) with the proteomes of the conditioned media of cancer cell lines and relevant biological fluids will minimize the shortcomings of any one source, resulting in the identification of more promising candidates.
Tissue-specific proteins were identified as candidate biomarkers for colon, lung, pancreatic, and prostate cancer. The strategy described can be applied to identify tissue-specific proteins for other cancer sites. Colon, lung, pancreatic, and prostate cancer are ranked among the top leading causes of cancer-related deaths, cumulatively accounting for an estimated half of all cancer-related deaths [50]. Early diagnosis is essential for improving patient outcomes as early-stage cancers are less likely to have metastasized and are more amenable to curative treatment. The five-year survival rate when treatment is administered on organ-confined cancer compared to metastatic stages drops dramatically from 91% to 11% in colorectal cancer, 53% to 4% in lung cancer, 22% to 2% in pancreatic cancer, and 100% to 31% in prostate cancer [50].
Forty-eight tissue-specific proteins were identified as candidate biomarkers for the selected tissue types.
Accordingly, an aspect of the disclosure includes a method of identifying a candidate cancer biomarker comprising:
Using the strategy, a number of candidate biomarkers were identified. As described, 14 of the identified set, which were selected according to the described parameters, included known biomarkers. Further, CUZD1 was validated and shown to discriminate pancreas cancer samples from control benign samples as well as to differentiate different stages of pancreatic cancer.
Also described is identification of several candidate biomarkers through differential tissue proteomic analysis of pancreatic adenocarcinoma and adjacent normal tissues. DSP, LAMC2, GP73 and DSG2 were identified as candidates and LAMC2 and DSG2 were validated as biomarkers capable of discriminating between healthy normal and pancreatic cancer patients. LAMC2 appeared significantly elevated in the sera of pancreatic cancer patients
As described in the Examples, colon, lung, pancreas and prostate tissue specific candidate biomarkers were identified.
Another aspect of the disclosure includes a method of validating a candidate biomarker as a cancer biomarker comprising:
In an embodiment, the sample is a cell or tissue sample comprising cancer cells. For example, the sample can be a fresh tissue, frozen cells/tissue and/or fixed cells/tissue including formalin fixed, paraffin embedded (FFPE) samples. The sample can be a biopsy. In an embodiment, the sample comprises a biological fluid, such as blood or a fraction thereof such as serum or plasma.
The strategy disclosed can comprise a step of selecting for soluble biomarkers. Accordingly a further aspect includes a method of validating a candidate biomarker as a soluble cancer biomarker comprising:
In an embodiment, the biological fluid is selected from blood or a fraction thereof. In an embodiment, the fraction thereof is serum or plasma.
In an embodiment, the biological fluid is blood or a a blood fraction such as serum or plasma (e.g. in the case of pancreas, colon, lung and prostate). Alternatively, the biological fluid can comprise ascites (e.g. in the case of pancreas, lung and colon), seminal plasma (e.g. in the case of prostate cancer), periotenal fluid (e.g. in the case of pancreas, lung and colon), pancreatic juice (e.g. in the case of pancreas), and saliva (in the case of lung cancer).
For example when the sample is blood or a fraction thereof such as plasma, an ACD (anticoagulant) vacutainer tube can be used to collect the plasma samples. Samples are in an embodiment processed within 24 hours of blood draw, when samples are not frozen. Blood samples can be centrifuged at room temperature for example for about 10 minutes (at 1000×g) to pellet the cells. Right after the centrifugation, the plasma samples can be aliquoted into cryotubes and stored at −80° C. until analysis.
In an embodiment, the biomarker is selected from CEACAM7, CLCA1, GPA33, LEFTY1, ZG16, IRX5, LAMP3, MFAP4, SCGB1A1, TMEM100, AQP8, CTRB1, CTRB2, CUZD1, KLK1, PNLIPRP1, PNLIPRP2, PRSS3, REG3G, SLC30A8, NPY, PSCA, RLN1 and/or SLC45A3.
In another embodiment, the biomarker is selected from CUZD1 and/or LAMC2.
In an embodiment, a combination of candidate biomarkers is validated, the combination comprising two or more selected biomarkers. For example, two or more biomarkers may be used in combination to provide for example increased specificity and/or sensitivity.
In an embodiment, the two or more biomarkers are selected from CEACAM7, CLCA1, GPA33, LEFTY1 and/or ZG16 and the cancer is colon cancer.
In an embodiment, the two or more biomarkers are selected from IRX5, LAMP3, MFAP4, SCGB1A1, SFTPC and/or TMEM100 and the cancer is lung cancer.
In an embodiment, the two or more biomarkers are selected from AQP8, CELA2B, CELA3B, CTRB1, CTRB2, CUZD1, GCG, IAPP, INS, KLK1, PNLIPRP1, PNLIPRP2, PPY, PRSS3, REG3G, SLC30A8, DSP, LAMC2, GP73 and/or DSG2 and the cancer is pancreas cancer.
In an embodiment, the two or more biomarkers are selected from NPY, PSCA, RLN1 or SLC45A3 and the cancer is prostate cancer.
As disclosed herein, CUZD1 was validated and shown to be useful for discriminating subjects with pancreas cancer and subjects without. LAMC2 and DSG2 were also validated. In particular, CUZD1 and LAMC2 demonstrated strong diagnostic ability individually, retained diagnostic accuracy in CA19.9 negative PDAC cases, and multi-parametric models demonstrated complementarity of CUZD1 and/or LAMC2 with CA19.9, including for example in the detection of early stage PDAC (stages IA, IB, IIA and IIB) from benign conditions.
Accordingly, in an embodiment the method further comprises using a validated cancer biomarker for evaluating a probability a subject has cancer and/or as a diagnostic to diagnose a cancer.
Accordingly a further aspect provides a method of evaluating a probability a subject has cancer and/or diagnosing the subject with cancer, the method comprising:
Also provided in another aspect, is use of a biomarker selected from the group consisting of CEACAM7, CLCA1, GPA33, LEFTY1, ZG16, IRX5, LAMP3, MFAP4, SCGB1A1, SFTPC, TMEM100, AQP8, CELA2B, CELA3B, CTRB1, CTRB2, CUZD1, GCG, IAPP, INS, KLK1, PNLIPRP1, PNLIPRP2, PPY, PRSS3, REG3G, SLC30A8, KLK3, NPY, PSCA, RLN1, SLC45A3 DSP, LAMC2, GP73 and/or DSG2 for evaluating if a subject has cancer and/or diagnosing cancer, wherein the cancer is colon cancer if CEACAM7, CLCA1, GPA33, LEFTY1 and/or ZG16 is selected, the cancer is lung cancer if IRX5, LAMP3, MFAP4, SCGB1A1, SFTPC and/or TMEM100 is selected; the cancer is pancreas cancer if AQP8, CELA2B, CELA3B, CTRB1, CTRB2, CUZD1, GCG, IAPP, INS, KLK1, PNLIPRP1, PNLIPRP2, PPY, PRSS3, REG3G, SLC30A8, DSP, LAMC2, GP73 and/or DSG2 is selected; or the cancer is prostate cancer if NPY, PSCA, RLN1 and/or SLC45A3 is selected.
In an embodiment, the biomarker is or has been validated for example according to a method described herein.
In an embodiment, the evaluation is for diagnostic and prognostic and/or disease monitoring.
Several colon specific biomarkers were identified. In an embodiment, the colon cancer specific biomarker is selected from CEACAM7, CLCA1, GPA33, LEFTY1 and/or ZG16.
Several lung specific biomarkers were identified. In an embodiment, the lung cancer specific biomarker is selected from IRX5, LAMP3, MFAP4, SCGB1A1, SFTPC and/or TMEM100. Several pancreas specific biomarkers were identified. In an embodiment, the pancreatic cancer specific biomarker is selected from AQP8, CELA3B, CTRB1, CTRB2, CUZD1, GCG, IAPP, INS, KLK1, PNLIPRP1, PNLIPRP2, PPY, PRSS3, REG3G and/or SLC30A8. In another embodiment, the biomarker is selected from CUZD1 and/or LAMC2
Several prostate specific biomarkers were identified. In an embodiment, the prostate cancer specific biomarker is selected from KLK3, NPY, PSCA, RLN1 and/or SLC45A3. In an embodiment, the biomarker is CUZD1.
In an embodiment, the biomarker is LAMC2.
In another embodiment, the biomarker is DSG2.
As mentioned CUZD1 and LAMC2 demonstrated strong diagnostic ability individually, retained diagnostic accuracy in CA19.9 negative PDAC cases.
In an embodiment, the subject being evaluated and/or diagnosed for pancreatic cancer is CA19.9 negative.
CUZD1 and LAMC2 were able to distinguish and predict benign cases from pancreatic cancer cases. Accordingly in an embodiment, the control comprises a sample or samples of—or cut-off value derived from—benign nonpancreatic cancer illnesses, including for example chronic pancreatitis, pancreatic cyst, PD dilation and/or other benign conditions.
In addition, CUZD1 and LAMC2 were able to distinguish early from late stage pancreatic cancer.
Accordingly in an embodiment, the method comprising measuring CUZD1 and/or LAMC2 is for detecting early stage pancreatic cancer. In an embodiment, the method or use is for determining pancreatic cancer stage (e.g. early stage IA, IB or IIA; late stage can be stage III or IV) or pancreatic cancer resectabilty, and detecting a level of CUZD1 and/or LAMC2 below a control (e.g. where the control is for example derived from distinguishing early and late stage pancreatic cancer) is indicative of early stage and/or resectable cancer and above the control late stage or unresectable cancer. The control can for example be derived from comparing benign and early stage cancers. In such cases, above the cut-off distinguishing control from early stage would identify early stage pancreatic if for example below a second cutoff based on late stage pancreatic cancer.
Multi-parametric models demonstrated complementarity of CUZD1 and/or LAMC2 with CA19.9, including for example in the detection of early stage PDAC (stages IA, IB, IIA and IIB) from benign conditions.
In an embodiment, the cancer is early stage cancer. In an embodiment, pancreatic cancer is early stage pancreatic cancer. In
Two or more biomarkers of the disclosure can be assessed together. In an embodiment, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more biomarkers are assessed.
Multi-parametric models for combinations of markers can be used. Estimated coefficients of the model can be used to construct a combined score for each observation which is then used for the evaluation of the multi-parametric model. For example as described in Example 8, the 3 linear models evaluated for diagnostic performance in that Example are: (1) CA19.9+11.84·CUZD1, (2) CA19.9+0.202·LAMC2, (3) CA19.9+12.41·CUZD1+0.14·LAMC2.
In addition, a biomarker for example used in clinic and/or known in the art can be combined to improve diagnostic efficacy. For example, it is demonstrated that improved and up to 100% specificity could be obtained (for example see Examples 3 and 8) when CUZD1 and known biomarker CA19.9 were assessed together. Accordingly, in an embodiment, the method further comprises measuring the amount of an additional biomarker in the sample (e.g. in addition to a biomarker of the disclosure for example as listed in Tables 5-8 and/or 11).
In an embodiment, the additional biomarker is selected from CA19.9 CEA, CYFRA-21-1 NSE TPA, proGRP, SCC, CA125 and PSA. In an embodiment, the additional biomarker is CA19.9.
In an embodiment, the additional biomarker is selected from SFTPA2, SFTPB, SFTPD, CEL, CELA2A, CPA1, CPA2, CPB1, PNLIP, PRSS1, SYCN, ACPP, FOLH1, KLK2 and/or KLK3.
In an embodiment, the biomarker is CUZD1 and the additional biomarker is CA19.9.
In an embodiment, the biomarker is LAMC2 and the additional biomarker is CA19.9
In an embodiment, the biomarker is DSG2 and the additional biomarker is CA19.9.
In an embodiment, the method comprises measuring the level of CUZD1, LAMC2 and CA19.9.
As CUZD1 and LAMC2 are able to distinguish benign from early stage and early late from late stage pancreatic cancer, the markers can be useful for monitoring cancer.
Accordingly another aspect includes a method of monitoring pancreatic cancer progression, the method comprising:
The method can be employed to monitor treatment efficacy and/or recurrence. The base line sample can be any suitable comparator that is taken before the test sample, including for example before surgery, before treatment, or during treatment that is before the subsequent sample. The base line sample can be compared to a sample obtained during remission or stable disease to assess recurrence or disease worsening.
As further explained for example in Example 2 and 8, a cut off level can be determined and chosen. The cut off level can be chosen to provide a specific specificity and/or sensitivity. In an embodiment, the specificity is selected to be at least 80%, at least 85% or at least 90%. In another embodiment, the sensitivity is selected to be at least 80%, at least 85% or at least 90%. The specificity and/or sensitivity is in an embodiment between 70% and 99% or any 0.1 increment between and including 70% and 99%.
As an example, an optimized cutoff for each marker can be obtained by minimizing the total prediction error, using for example the following formula: √{square root over ((1−sensitivity)2+(1−specificity)2)}{square root over ((1−sensitivity)2+(1−specificity)2)}.
Cutoffs can be chosen based on the shortest distance of the ROC curve to the top-left corner. For example as described in Example 8, ROC curve showed the optimum diagnostic cutoff for CA19.9 was 20.3 U/mL, (area under the curve AUC 0.85, 95% CI 0.80-0.91, sensitivity 77.5%, specificity 83.1%; (
For example, a cut off level of 3.1 ng/ml was selected for CUZD1 in Example 2. Other cut off levels examined include for example 1.8 ng/mL (Example 8), 2.2 ng/mL (e.g.
In an embodiment, the amount of CUZD1 indicative for cancer is greater than 3.1 ng/ml mean concentration (in the absence of a very optimized immune-assay the cutoff value can range between 1.5 ng/ml up to approx. 10 ng/ml/. In an embodiment, cutoff value for CUZD1 in the diagnosis of pancreatic cancer is about 2 to about 5 ng/ml.
In an embodiment, the amount of CUZD1 indicative the subject has pancreatic cancer or an increased probability of pancreatic cancer is greater than about 2 ng/ml, 2.2 ng/ml, 2.4 ng/ml, 2.6 ng/ml, 2.8 ng/ml, 3 ng/ml, 3.1 ng/ml, 3.2 ng/ml, 3.4 ng/ml, 3.6 ng/ml, 3.8 ng/ml, 4 ng/ml, 4.2 ng/ml, 4.4 ng/ml, 4.6 ng/ml, 4.8 ng/ml, 5 ng/ml.
A person skilled in the art would recognize that the control and/or cut-off level selected can vary, for example according to the method employed eg to evaluate a probability, diagnose, monitor disease or treatment efficacy as well as the number of biomarkers being assessed.
Cut-off levels were also determined for LAMC2. For example a cut off level of 150 ng/ml is used for example in Example 8.
In an embodiment, the amount of LAMC2 indicative the subject has pancreatic cancer or an increased probability of pancreatic cancer is greater than about 100 ng/ml, 120 ng/ml, 140 ng/ml. 160 ng/ml, 170 ng/ml, 180 ng/ml, 200 ng/ml, 220 ng/ml, 240 ng/ml, 260 ng/ml, 280 ng/ml, 300 ng/ml, 320 ng/ml, 340 ng/ml, 360 ng/ml, 380 ng/ml or 400 ng/ml.
The cut-off can also be based on fold increase. In an embodiment, the level of biomarker in the sample is at least 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold, 6 fold, 7 fold, 8 fold, 9 fold, 10 fold, 11 fold, 12 fold or at least 15 fold increased compared to the control.
The methods can be combined with conventional methods. For example, the methods can be combined and/or confirmed with conventional cancer imaging methods. For example, conventional imaging tools that can be used for example to diagnose pancreatic cancer include computerized tomography (CT) scanning, magnetic resonance imaging (MRI), endoscopic ultrasonography (EUS), and endoscopic retrograde cholangiopancreatography (ERCP). These methods can be costly and/or invasive but are powerful in tumour staging and confirming a suspected pancreatic mass. In an embodiment, CUZD1, LAMC2 optionally in combination with CA19.9 are measured and when an increase amount compared to a control is detected, the method further comprises follow up testing with a conventional imaging tool or other diagnostic method.
A person skilled in the art would recognize that a number of methods can be used to measure the level of a polypeptide biomarker. In an embodiment, the measuring comprises an immunoassay, for example immunohistochemistry, ELISA, Western blot, immunoprecipation and the like, where a biomarker detection agent such as an antibody for example, a labeled antibody, is contacted with the sample specifically binds the biomarker and permits for example relative or absolute ascertaining of the amount of polypeptide biomarker.
In an embodiment, the method comprises incubating the sample with a first antibody specific for the biomarker which is directly or indirectly labeled with a detectable substance and a second antibody specific for the biomarker which is immobilized; separating and removing unbound first antibody from the second antibody; and determining the amount of biomarker by measuring the detectable substance.
Each biomarker is detected by an antibody that binds specifically to the biomarker. In an embodiment, each antibody is independently selected from the group consisting of a monoclonal antibody, a polyclonal antibody, immunologically active antibody fragment, humanized antibody, an antibody heavy chain, an antibody light chain, a genetically engineered single chain Fv molecule, or a chimeric antibody.
For nucleic acid biomarker embodiments, hybridization and PCR protocols where a probe or primer or primer set are used to ascertain the amount of nucleic acid biomarker can be used, including for example probe based and amplification based methods including for example microarray analysis, RT-PCR such as quantitative RT-PCR, serial analysis of gene expression (SAGE), Northern Blot, digital molecular barcoding technology, for example Nanostring nCounter™ Analysis, and TaqMan quantitative PCR assays. Other methods of mRNA detection and quantification can be applied, such as mRNA in situ hybridization in formalin-fixed, paraffin-embedded (FFPE) tissue samples or cells.
In an embodiment, the method is for early detection of cancer.
Another aspect includes an array that comprises probes for detecting one or more biomarkers of the disclosure and optionally additional biomarkers. In an embodiment, the array comprises probes for detecting one or more or all of the biomarkers listed in Table 5, 6, 7, 8 and/or 11.
Also provided in another aspect is a kit which can be for use in a method or use described herein. In an embodiment, the kit comprises one or more of: a biomarker specific reagent for a biomarker of the disclosure and optionally an additional biomarker; a kit standard; instructions for use and a vial housing the biomarker specific reagent and/or kit standard.
In an embodiment, the kit comprises two or more antibodies. In an embodiment, the two or more antibodies comprise an antibody specific for CUDZ1 and an antibody specific for CA19.9.
In another embodiment still, the kit further comprises reagents for qRT-PCR, including buffers, reverse transcription and amplification primers for the target genes and endogenous control genes, and control RNA from normal oral tissue.
In another embodiment, the kit further comprises reagents for digital molecular barcoding technology, including for example buffers, hybridization solution, and/or one or more labeled probes.
The kit can optionally comprise sample collection tubes and/or assay plates for conducting one or more assays.
In an embodiment, the kit comprises a kit standard, and at least one biomarker specific agent that can measure or be used in an assay to measure an expression level of a biomarker selected from biomarkers listed in Table 4 and/or 11, or optionally a biomarker listed in Tables 5, 6, 7, 8 and/or 11.
In an embodiment, the kit standard is a quantity of a biomarker for use as a standard.
In another embodiment, the kit standard is an RNA control such as reference RNA.
In an embodiment, the kit comprises an array comprising a plurality of biomarker detection agents for detecting one or more biomarkers listed in Table 4, or optionally Tables 5, 6, 7, 8 and/or 11.
In an embodiment, the kit comprises a sample collection vessel for example a vacutainer tube or other sterile tube for biological fluid (e.g. blood) collection. The sample collection vessel can be uniquely numbered or comprise other identifier. The kit can include instructions, for example stipulating the how to use the kit with a method disclosed herein and/or instructions for obtaining and sending the sample for assessment as well as how to retrieve from an electronic database, the result of the test and/or prognosis
In an embodiment, the kit is a diagnostic kit.
The above disclosure generally describes the present application. A more complete understanding can be obtained by reference to the following specific examples. These examples are described solely for the purpose of illustration and are not intended to limit the scope of the application. Changes in form and substitution of equivalents are contemplated as circumstances might suggest or render expedient. Although specific terms have been employed herein, such terms are intended in a descriptive sense and not for purposes of limitation.
The following non-limiting examples are illustrative of the present application:
There is an important need for the identification of novel serological biomarkers for the early detection of cancer. Current biomarkers suffer from a lack of tissue-specificity, rendering them vulnerable to non-disease-specific increases. The present study details a strategy to rapidly identify tissue-specific proteins using bioinformatics.
Previous studies focus on either gene or protein expression databases for the identification of candidates. An strategy was developed that mines six publicly available gene and protein databases for tissue-specific proteins, selects proteins likely to enter the circulation, and integrates proteomic datasets enriched for the cancer secretome, to prioritize candidates for further verification and validation studies.
Using colon, lung, pancreas, and prostate cancer as case examples, 48 candidate tissue-specific biomarkers were identified.
A novel strategy using bioinformatics to identify tissue-specific proteins that are potential cancer serum biomarkers is described further in Examples below.
Seven gene and protein databases were mined to identify proteins highly specific to or strongly expressed in one tissue. Colon, lung, pancreatic, and prostate tissues were examined.
Each tissue was searched in the C-It database [10] for proteins enriched in the selected tissue (human data only). Since the C-It database did not have colon data available, only lung, pancreas, and prostate tissue were searched. Literature information search parameters of fewer than five publications in PubMed and fewer than three publications with the Medical Subject Headings (MeSH) term of the searched tissue were used. The option of adding z-scores of the corresponding SymAtlas microarray probe sets to the protein list was included [16]. Only proteins with a corresponding SymAtlas z-score of ≧|1.96|, corresponding to a 95% confidence level of enrichment, were included in the lists. Proteins without a SymAtlas z-score were ignored. The TiGER database [12] was searched for proteins preferentially expressed in each tissue based on ESTs by searching each tissue using ‘Tissue View’. The UniGene database [14] was searched for tissue-restricted genes using the following search criteria: [tissue][restricted]+“Homo sapiens”, for the lung, pancreas, and prostate tissues. Since the UniGene database did not have data for the colon tissue, a search of: [colorectal tumor][restricted]+“Homo sapiens” was used.
The BioGPS database (v. 2.0.4.9037) [17] plugin ‘Gene expression/activity chart’ using the default human data set ‘GeneAtlas U133A, gcrma’ [16] was searched with a protein whose gene expression profile using the BioGPS plugin showed it to be specific to, and strongly expressed in, one tissue of interest. Chloride channel accessory 4 (CLCA4), surfactant protein A2 (SFTPA2), pancreatic lipase (PNLIP), and kallikrein-related peptidase 3 (KLK3) were selected for colon, lung, pancreas, and prostate tissues, respectively. For each protein searched, a correlation cutoff of 0.9 was used to generate a list of proteins with a similar expression pattern to the initial protein searched. Each tissue was searched in the VeryGene database [19] using ‘Tissue View’ for tissue-selective proteins.
The HPA [21] was searched for proteins strongly expressed in each normal tissue with annotated expression. Annotated protein expression is a manually curated score based on IHC staining patterns in normal tissues from two or more paired antibodies binding to different epitopes of the same protein, which describes the distribution and strength of expression of each protein in cells [51].
An in-house developed Microsoft Excel macro was utilized to evaluate the number of times a protein was identified in each tissue and which database had identified it. Proteins identified in only one database were eliminated. Proteins identified in databases could represent more promising candidates at this stage, since databases based on varying sources of data identified the protein as being highly specific to or strongly expressed in one tissue.
For each tissue type, the list of proteins identified in ≧2 databases was exported into a comma-delimited Microsoft Excel file. An in-house secretome algorithm [Karagiannis G S et al., unpublished] was applied to identify proteins that are either secreted or shed. The secretome algorithm designates a protein as secreted or shed if it is either predicted to be secreted based on the presence of a signal peptide, or through non-classical secretion pathways, or predicted to be a membranous protein based on amino-acid sequences corresponding to transmembrane helices. Proteins that were not designated as secreted or shed were eliminated.
The BioGPS and HPA databases were used to manually verify the expression profiles of the proteins identified as being secreted or shed, for strength and specificity of expression. The BioGPS database was chosen above the other gene databases as it offers a gene expression chart and the ability to batch search for a list of proteins, which allowed efficient searching and verification of protein lists. If expression profiles were not available in the BioGPS database, the protein was eliminated.
The BioGPS database plugin ‘Gene expression/activity chart’ using the default human data set ‘GeneAtlas U133A, gcrma’ was searched for each protein. For each tissue, proteins with gene expression profiles showing similar values of expression in, or strong expression in, more than the selected tissue were eliminated (strong expression is defined as ≧10 times the median expression value in all tissues). In BioGPS, the color of the bars in the ‘Gene expression/activity chart’ reflects a grouping of similar samples, based on global hierarchical clustering. If strong expression was seen in more than the selected tissue, but only in tissues with the same bar color, the protein was not eliminated.
The HPA was searched for each protein, and the ‘Normal Tissue’ expression page was evaluated. Tissue presentation order by organ was selected. Preference for the evaluation of the protein's expression in normal tissue was based on the level of annotated protein expression and if annotated expression was not available, evaluation was based on the level of antibody staining. The levels of annotated protein expression are none, low, medium, and high and the levels of antibody staining are negative, weak, moderate, and strong. For each tissue, proteins with high/strong expression in the selected tissue and medium/moderate expression in more than two other tissues were eliminated. Proteins with high/strong or medium/moderate expression in more than the one selected tissue were eliminated. Proteins with low/weak or none/negative expression in the selected tissue were eliminated. If the high/strong and/or medium/moderate was seen in more than the one selected tissue, where the other tissues are in the same organ, and low/weak and/or none/negative expression in all other tissues, the protein was included.
Proteins with pending HPA data were evaluated based on their gene expression profiles. Proteins whose HPA protein expression profiles fit the criteria for elimination but whose gene expression profiles did not fit the criteria for elimination, were eliminated.
The PubMed database was manually searched for each of the proteins whose expression profile was verified in silico. For each tissue, proteins that had been previously studied as candidate cancer or benign disease serum biomarkers in the selected tissue were identified. Proteins with high abundance in serum (>5 μg/mL) or known physiology and expression were eliminated.
An in-house developed Microsoft Excel macro was utilized for comparison of the remaining protein lists against previously characterized in-house proteomes of the CM from 44 cancer cell lines and three near normal cell lines, and 11 relevant biological fluids [22-33, unpublished data]. Proteomes were characterized using multi-dimensional liquid chromatography tandem mass spectrometry on a linear ion trap (LTQ) Orbitrap mass spectrometer. For details, see previous publications [22-33]. The cancer cell lines were from six cancer types (breast, colon, lung, ovarian, pancreatic, and prostate cancer). The relevant biological fluids included amniotic fluid (normal, with Down Syndrome), nipple aspirate fluid, non-malignant peritoneal fluid, ovarian ascites, pancreatic ascites, pancreatic juice, pancreas tissue (normal and malignant), and seminal plasma. A complete list of cell lines and relevant biological fluids is provided in Table 1. If a protein was identified in amniotic fluid and the proteome of a tissue, this was noted but not considered as expression in a non-tissue proteome.
Moreover, the data of proteomes from the CM of 23 cancer cell lines (from 11 cancer types) was integrated, as recently published by Wu et al. [52]. Proteomes were characterized using one-dimensional SDS-PAGE and nano-liquid chromatography tandem mass spectrometry on a LTQ-Orbitrap mass spectrometer. The 11 cancer types include breast, bladder, cervical, colorectal, epidermoid, liver, lung, nasopharyngeal, oral, and pancreatic cancer, and T cell lymphoma [52]. If a protein was identified in a proteomic dataset, the proteome in which it was identified in, was noted.
A total of 3615 proteins highly specific to or strongly expressed in the colon, lung, pancreas, or prostate were identified by searching the databases. Searching the databases identified 976, 679, 1059, and 623 unique proteins that were highly specific to or strongly expressed in the colon, lung, pancreas, and prostate, respectively (Table 2). For the four tissue types, the C-It database identified 254 tissue-enriched proteins, the TiGER database identified 636 proteins preferentially expressed in tissue, and the UniGene database identified 84 tissue-restricted proteins. The BioGPS database identified 127 proteins similarly expressed as a protein with known tissue specificity, and the VeryGene database identified 365 tissue-selective proteins. The HPA identified 2149 proteins showing strong tissue staining and with annotated expression. A complete list of proteins identified in each tissue, by each database is summarized in Table 3.
A total of 32 proteins in the colon, 36 proteins in the lung, 81 proteins in the pancreas, and 48 proteins in the prostate were identified in ≧2 databases. Selecting for proteins identified in ≧2 databases eliminated between 92%-97% of the proteins in each of the tissue types. The majority of the remaining proteins were identified in only two of the databases, and no proteins were identified in six or all the databases. This data is summarized in Table 2.
The majority of the proteins identified in ≧2 databases were identified as being secreted or shed. In total, 143 of the 197 proteins from all tissues were designated as being secreted or shed (Table 2). Specifically, 26 proteins in the colon, 25 proteins in the lung, 58 proteins in the pancreas, and 34 proteins in the prostate were designated as being secreted or shed.
Manual verification of the expression profiles of the secreted or shed proteins identified in ≧2 databases (as exemplified in the Experimental Procedures) eliminated the majority of the proteins. Twenty-one proteins in the colon, 16 proteins in the lung, 32 proteins in the pancreas, and 26 proteins in the prostate were eliminated. Only five (0.5%) of the 976 proteins initially identified as highly specific to or strongly expressed in the colon, were found to meet the filtering criteria. Nine (1.3%) of 679 proteins in the lung, 26 (2.4%) of 1059 proteins in the pancreas, and eight (1.3%) of 623 proteins in the prostate were found to meet the filtering criteria. These remaining 48 proteins are tissue-specific and secreted or shed and therefore, represent candidate biomarkers (Table 4).
The performance of the databases was evaluated by determining how many of the 48 proteins that passed the filtering criteria were initially identified by each database. The TiGER database had been responsible for initially identifying the greatest number of proteins that passed the filtering criteria. The TiGER database, the BioGPS database, and the VeryGene database had each identified >68% of the 48 proteins. The TiGER database had identified 40 of the 48 proteins, and the BioGPS and VeryGene databases had both identified 33 of 48 proteins. The UniGene database identified 35% (17 of 48) of the proteins and the C-It database and the HPA both identified 19% (nine of 48) of the proteins (Table 4).
The accuracy of the initial protein identifications was evaluated by comparing the proportion of proteins which each database had initially identified, that passed the filtering criteria, to the total number of proteins each database initially identified. The BioGPS database showed the highest accuracy of initial protein identification. Of the proteins initially identified by the BioGPS database, 26% (33 of 127) met all the filtering criteria. The UniGene database showed 20% accuracy (17 of 84), VeryGene showed 9% (33 of 365), TiGER showed 6% (40 of 636), C-It showed 4% (9 of 254), and HPA showed 0.4% (9 of 2149).
None of the colon-specific proteins had been previously studied as serum colon cancer biomarkers. Surfactant proteins have been extensively studied in relation to various lung diseases [53], and surfactant protein A2 (SFTPA2), surfactant protein B (SFTPB), and surfactant protein D (SFTPD) have been studied as serum lung cancer/lung disease biomarkers [54-56]. Elastase proteins have been studied in pancreatic function and disease [57], islet amyloid polypeptide, and pancreatic polypeptide are normally secreted [58,59] and glucagon and insulin are involved in the normal function of healthy individuals. Eight of the pancreas-specific proteins had been previously studied as serum pancreatic cancer/pancreatitis biomarkers [33,60-65]. Four of the prostate-specific proteins had been previously studied as serum prostate cancer biomarkers [66-68] (Table 4).
Protein Overlap with Proteomic Datasets
Of the tissue-specific proteins that had not been studied as serum tissue cancer biomarkers, 18 of the 26 proteins were identified in proteomic datasets (Tables 5-8). Nine proteins were exclusively identified in datasets of corresponding tissues. Of the colon-specific proteins, only glycoprotein A33 (GPA33) was identified exclusively in colon datasets. GPA33 was identified in the CM of three colon cancer cell lines (LS174T, LS180, and Colo205) [Karagiannis et al., unpublished, 52] (Table 5). None of the lung-specific proteins were identified in lung datasets (Table 6). Seven pancreas-specific proteins were exclusively identified in pancreas datasets: in the pancreatic cancer ascites [32], pancreatic juice [33], and/or normal and/or cancerous pancreatic tissue [Kosanam et al., unpublished] (Table 7). None were identified in the CM of pancreatic cancer cell lines. Neuropeptide Y (NPY) was the only prostate-specific protein identified exclusively in prostate datasets. NPY was identified in the CM of the prostate cancer cell line VCaP [Saraon et al., unpublished] and the seminal plasma proteome [25] (Table 8).
A strategy to identify tissue-specific biomarkers using publicly available gene and protein databases is described. Since serological biomarkers are protein-based, using only protein expression databases for the initial identification of candidate biomarkers seems more relevant. While the HPA has characterized more than 50% of human protein-encoding genes (11200 unique proteins to date), it has not completely characterized the proteome [51]. Therefore, proteins which have not been characterized by HPA but fulfill the desired criteria would be missed by searching only the HPA. There are also important limitations in using gene expression databases since there is considerable variation between mRNA and protein expression [69,70] and gene expression does not account for post-translational modification events [71]. Therefore, mining both gene and protein expression databases minimizes the limitations of each platform. To the best of the knowledge, no studies for the initial identification of candidate cancer biomarkers have been conducted using both gene and protein databases.
Initially, the databases were searched for proteins highly specific to or strongly expressed in one tissue. The search criteria were tailored to accommodate for the design of the databases, which did not allow for the simultaneous searching with both criteria. Identifying proteins that were highly specific to and strongly expressed in one tissue was considered in a later step. In the verification of the expression profiles (see Experimental Procedures), only 34% (48 of 143) of the proteins were found to meet both criteria. The number of databases mined in the initial identification can be varied at the discretion of the investigator. Additional databases will result in the same number of, or more, proteins being identified in ≧2 databases.
In the gene expression databases, the criteria used were set for maximum stringency for protein identification, to identify a manageable number of candidates. A more exhaustive search can be conducted using lower stringency criteria. The stringency could be varied in the correlation analysis using the BioGPS database plugin and the C-It database. The correlation cutoff of 0.9 used in identifying similarly expressed genes in the BioGPS database plugin could be reduced to as low as 0.75. The SymAtlas Z-Score of ≧|1.96| could be reduced to ≧|1.15|, corresponding to a 75% confidence level of enrichment. The literature information parameters used in the C-It database of fewer than five publications in Pubmed and fewer than three publications with MeSH term of the selected tissue could be reduced in stringency, to allow identification of well-studied proteins. Since C-It does not look at the content of publications in PubMed, it filters out proteins that have been studied even if they have not been studied in relation to cancer.
Although proteins which have been well-studied, but not as cancer biomarkers, represent potential candidates, in this study emphasis was on identifying novel candidates which have been, overall, minimally studied. A gene's mRNA level and protein expression can have significant variability. Therefore, if lower stringency criteria were used when identifying proteins from gene expression databases, a greater number of protein would have been identified in at least two of the databases, potentially leading to a greater number of candidate protein biomarkers identified after application of the remaining filtering criteria.
The HPA was searched for proteins strongly expressed in one normal tissue with annotated IHC expression. Annotated IHC expression was selected since it uses paired antibodies to validate the staining pattern, providing the most reliable estimation of protein expression. Approximately, 2020 of the 10100 proteins in version 7.0 of the HPA have annotated protein expression [51]. Makawita et al. [33] included the criteria of annotated protein expression when searching for proteins with ‘strong’ pancreatic exocrine cell staining for prioritization of pancreatic cancer biomarkers. A more exhaustive search could be conducted by searching the HPA without annotated IHC expression.
Secreted or shed proteins have the highest chance of entering circulation and being detected in the serum. Many groups, including the Diamandis group[23-25, 27-33], use Gene Ontology (GO) [72] protein cellular localization annotations of ‘extracellular space’ and ‘plasma membrane’ to identify a protein as secreted or shed. GO cellular annotations do not completely describe all proteins and are not always consistent with if a protein is secreted or shed. An in-house designed secretome algorithm [Karagiannis et al., unpublished data] designates a protein as secreted or shed if it is either predicted to be secreted based on the presence of signal peptide, predicted non-classical secretion or predicted as a membranous protein based on amino-acid sequences corresponding to transmembrane helices. It more robustly defines proteins as secreted or shed and was therefore used in this study.
Evaluating which of the databases had initially identified the 48 tissue-specific proteins that passed the filtering criteria, showed that the gene expression databases had identified more of the proteins than the protein expression database. The HPA had initially identified only nine of the 48 tissue-specific proteins. The low initial identification of tissue-specific proteins was due to the stringent search criteria requiring annotated IHC expression. For example, 20 of the 48 tissue-specific proteins had protein expression data available in the HPA, of which the 11 proteins that were not initially identified by HPA did not have annotated IHC expression. The expression profiles of those proteins would have passed the ‘Verification of In Silico Expression Profiles’ filtering criteria, and therefore, would have resulted in a greater initial identification of tissue-specific proteins by the HPA.
The HPA has characterized 11200 unique proteins, which is more than 50% of human protein-encoding genes [51]. Of the 48 tissue-specific proteins that met the selection criteria, only nine were initially identified from mining the HPA. Twenty of the tissue-specific proteins have been characterized by the HPA. This demonstrates the importance of combining gene and protein databases to identify candidate cancer serum biomarkers. If only the HPA was searched for tissue-specific proteins, even with lowered stringency, the 28 proteins that met the filtering criteria and represent candidate biomarkers would not have been identified.
The TIGER, UniGene, and C-It databases are based on ESTs and collectively identified 46 of the 48 proteins. Of those, only 41% (19 of the 46) were identified in of those databases. The BioGPS and VeryGene databases are based on microarray data and collectively identified 46 of the 48 proteins. Of those, 56% (26 of the 46) were identified uniquely by BioGPS and VeryGene. Clearly, even though databases are based on similar sources of data, individual databases still identified unique proteins. This demonstrates the validity of the initial approach of using databases that differently mine the same data source. The TIGER, BioGPS, and VeryGene databases collectively identified all 48 of the tissue-specific proteins. From those three databases, 88% (42 of the 48) were identified in databases, demonstrating the validity of selecting proteins identified in more than one database.
The accuracy of the databases' initial protein identification is related to how explicitly the database could be searched for the filtering criteria of proteins highly specific to and strongly expressed in one tissue. The BioGPS database had 26% accuracy, the highest, as it was searched for proteins similarly expressed as a protein of known tissue specificity and strong expression. The UniGene database, accuracy of 20%, could only be searched for proteins with tissue-restricted expression, without the ability to search for proteins also with strong expression in the tissue. The VeryGene database, accuracy of 9%, was searched for tissue-selective proteins and the TiGER database, accuracy of 6%, was searched for proteins preferentially expressed in a tissue. Their lower accuracies reflect that they could not be explicitly searched for proteins highly specific to only one tissue. The C-It database, accuracy of 4%, searched for tissue-enriched proteins and the HPA, accuracy of 0.4%, searched for proteins with strong tissue staining. These very low accuracies reflect that the search looked for proteins with strong expression in a tissue, but could not be searched for proteins highly specific to only one tissue.
The low identification of tissue-specific proteins by the C-It database is not unexpected. Given that the literature search parameters initially used, filtered out any proteins, which have ≧5 publications in PubMed, regardless of whether those publications were related to cancer, C-It only identified proteins enriched in a selected tissue which have been minimally, if at all, studied. Of the nine proteins C-It initially identified from the tissue-specific list, eight of the proteins had not been previously studied as serum candidate cancer biomarkers. Syncollin (SYCN) has only very recently been shown to be elevated in the serum of pancreatic cancer patients [33]. The eight remaining proteins C-It had identified represent especially interesting candidate biomarkers because they represent proteins that fulfill the filtering criteria but have not been well studied.
A PubMed search revealed that 14 of the 48 tissue-specific proteins identified had been previously studied or suggested as serum markers of cancer or benign disease, providing credence to the approach. The most widely used biomarkers currently suffer from a lack of sensitivity and specificity due to the fact they are not tissue-specific. CEA is a widely used colon and lung cancer biomarker. It was identified by the BioGPS and TIGER databases and the HPA as highly specific to or strongly expressed in the colon, but not by any of the databases for the lung. CEA was eliminated upon evaluating the protein expression profile in silico, since it is not tissue specific. High levels of CEA protein expression were seen in the normal tissues of the digestive tract, such as esophagus, small intestine, appendix, colon, and rectum, as well as in bone marrow, and medium levels were seen in the tonsil, nasopharynx, lung, and vagina. PSA is an established, clinically relevant biomarker for prostate cancer with demonstrated tissue-specificity. PSA was identified in the strategy as a prostate-specific protein, after passing all the filtering criteria. This provides credence to the approach since the known clinical biomarkers and the strategy filtered out the biomarkers based on tissue-specificity were re-identified.
From the list of candidate proteins that have not been studied as serum cancer or benign disease biomarkers, 18 of the 26 proteins were identified in proteomic datasets. The proteomic datasets primarily contain the CM proteomes of various cancer cell lines, as well as other relevant fluids, enriched for the secretome. For proteins that have not been characterized by the HPA, it is possible the transcripts are not translated, in which case they would represent unviable candidates. If the transcripts are translated and the protein enters circulation, it must do so at a level detectable by current proteomic techniques. Proteins that have been characterized by the HPA may not necessarily enter circulation. The identification of proteins in the proteomic datasets verifies the presence of the protein in the secretome of cancer, at a detectable level, and therefore represent viable candidates. Since cancer is a highly heterogeneous disease, the integration of multiple cancer cell lines and relevant biological fluids likely provides a more, but not necessarily complete picture of the cancer proteome.
Relaxin 1 (RLN1) is a candidate protein which was not identified in any of the proteomes but its expression was confirmed by semi-quantitative RT-PCR in prostate carcinomas [73]. Therefore, if a protein was not identified in any of the proteomic datasets it does not necessarily imply that the protein is not expressed in cancer.
The proposed strategy seeks to identify candidate tissue-specific biomarkers for further experimental studies. Using colon, lung, pancreas, and prostate cancer as case examples, a total of 26 tissue-specific candidate biomarkers were identified. Using this strategy, investigators can rapidly screen for candidate tissue-specific serum biomarkers and prioritize candidates for further study based on overlap with proteomic datasets. This strategy can be used to identify candidate biomarkers for any tissue, contingent on the data availability in the mined databases, and incorporate various proteomic datasets, at the discretion of the investigator.
Pancreatic cancer is the fourth leading cause of cancer-related deaths and one of the most highly aggressive and lethal of all solid malignancies [50]. Because of the asymptomatic nature of its early stages, coupled with inadequate methods for early detection, the majority of patients (>75%) present with locally advanced and inoperable disease at the time of diagnosis [50]. At these advanced stages, chemotherapy, radiation, and combinatorial therapies are largely anecdotal, and less than 5% of patients survive up to five-years post diagnosis [50, 75].
One way to aid in the clinical management of cancer patients is through the use of serum biomarkers. Currently, the most widely used biomarker for pancreatic cancer is carbohydrate antigen 19.9 (CA19.9), a sialylated Lewis A antigen found on the surface of proteins [5, 76]. Although CA19.9 is elevated mainly in late stage pancreatic cancer, it is also elevated in benign diseases of the pancreas and in other malignancies of the gastrointestinal tract [77]. Other tumor markers such as members of the carcinoembryonic antigen (CEA) [78, 79] and mucin (MUC) [80-82] families have also been associated with pancreatic cancer. When used in combination, with or without CA-19.9, some of these markers have shown enhanced sensitivity and specificity; however none have become a constant fixture in the clinic. The lack of a single highly specific and sensitive marker has led to a growing consensus in the field toward the development of multiparametric panels of biomarkers, whereby the combinatorial assessment of multiple molecules can likely achieve increased sensitivity and specificity for disease detection and management [83-85].
CUZD1 [Swiss-Prot: Q86UP6] is a protein of unknown function that has homology to chimpanzee, dog, mouse, rat, and chicken. Previously, CUZD1 has been identified by immunohistochemistry in normal ovarian and ovarian tumor cells [86]. These findings suggest that CUZD1 has a role in cell motility, cell-cell interactions and/or interactions with the extracellular matrices [86].
Five gene databases and one protein database were mined to identify proteins highly specific to or strongly expressed in the pancreas tissue. The C-It [9, 10], Tissue-specific and Gene Expression and Regulation (TiGER) [11, 12], and UniGene [13, 14] databases are based on expressed sequence tags (ESTs). The BioGPS [15-17] and VeryGene [18, 19] databases are based on microarray data. The Human Protein Atlas (HPA) [20, 21] is based on immunohistochemistry (IHC) data.
The C-It database [10] was searched for proteins enriched in the pancreas. Literature information search parameters of fewer than five publications in PubMed and fewer than three publications with the Medical Subject Headings (MeSH) term of the pancreas were used. The option of adding z-scores of the corresponding SymAtlas microarray probe sets to the protein list was included [16]. Only proteins with a corresponding SymAtlas z-score of ≧|1.96|, corresponding to a 95% confidence level of enrichment, were included in our lists. Proteins without a SymAtlas z-score were ignored. The TIGER database [12] was searched for proteins preferentially expressed in the pancreas based on ESTs by searching using ‘Tissue View’. The UniGene database [14] was searched for pancreas-restricted genes using the following search criteria: [pancreas][restricted]+“Homo sapiens”. The BioGPS database (v. 2.0.4.9037) [17] plugin ‘Gene expression/activity chart’ using the default human data set ‘GeneAtlas U133A, gcrma’ [16] was searched with a protein whose gene expression profile using the BioGPS plugin showed it to be specific to, and strongly expressed in, the pancreas. Pancreatic lipase (PNLIP) was selected. A correlation cutoff of 0.9 was used to generate a list of proteins with a similar expression pattern to the initial protein searched. The VeryGene database [19] was searched for pancreas-selective proteins using ‘Tissue View’. The HPA [21] was searched for proteins strongly expressed in the normal pancreas with annotated expression. Annotated protein expression is a manually curated score based on IHC staining patterns in normal tissues from two or more paired antibodies binding to different epitopes of the same protein, which describes the distribution and strength of expression of each protein in cells [51].
An in-house developed Microsoft Excel macro was utilized to evaluate the number of times a protein was identified in the pancreas and which database had identified it. Proteins identified in only one database were eliminated. Proteins identified in ≧2 databases could represent more promising candidates at this stage, since databases based on varying sources of data identified the protein as being highly specific to or strongly expressed in one tissue.
The list of proteins identified in ≧2 databases was exported into a comma-delimited Microsoft Excel file. An in-house secretome algorithm [Karagiannis G S et al., unpublished] was applied to identify proteins that are either secreted or shed. The secretome algorithm designates a protein as secreted or shed if it is either predicted to be secreted based on the presence of a signal peptide, or through non-classical secretion pathways, or predicted to be a membranous protein based on amino-acid sequences corresponding to transmembrane helices. Proteins that were not designated as secreted or shed were eliminated.
The BioGPS and HPA databases were used to manually verify the expression profiles of the proteins identified as being secreted or shed, for strength and specificity of expression. The BioGPS database was chosen above the other gene databases as it offers a gene expression chart and the ability to batch search for a list of proteins, which allowed efficient searching and verification of protein lists. If expression profiles were not available in the BioGPS database, the protein was eliminated.
The BioGPS database plugin ‘Gene expression/activity chart’ using the default human data set ‘GeneAtlas U133A, gcrma’ was searched for each protein. Proteins with gene expression profiles showing similar values of expression in, or strong expression in, more than the pancreas were eliminated (strong expression is defined as ≧10 times the median expression value in all tissues). In BioGPS, the color of the bars in the ‘Gene expression/activity chart’ reflects a grouping of similar samples, based on global hierarchical clustering. If strong expression was seen in more than the pancreas, but only in tissues with the same bar color, the protein was not eliminated.
The HPA was searched for each protein, and the ‘Normal Tissue’ expression page was evaluated. Tissue presentation order by organ was selected. Preference for the evaluation of the protein's expression in normal tissue was based on the level of annotated protein expression and if annotated expression was not available, evaluation was based on the level of antibody staining. The levels of annotated protein expression are none, low, medium, and high and the levels of antibody staining are negative, weak, moderate, and strong. Proteins with high/strong expression in the pancreas and medium/moderate expression in more than two other tissues were eliminated. Proteins with high/strong or medium/moderate expression in more than the pancreas were eliminated. Proteins with low/weak or none/negative expression in the pancreas were eliminated. If the high/strong and/or medium/moderate was seen in more than the pancreas, where the other tissues are in the same organ, and low/weak and/or none/negative expression in all other tissues, the protein was included.
Proteins with pending HPA data were evaluated based on their gene expression profiles. Proteins whose HPA protein expression profiles fit the criteria for elimination but whose gene expression profiles did not fit the criteria for elimination, were eliminated.
The PubMed database was manually searched for each of the proteins whose expression profile was verified in silico. Proteins that had been previously studied as candidate pancreatic cancer or benign disease serum biomarkers were identified and excluded. Proteins with high abundance in serum (>5 μg/mL) or known physiology and expression were also eliminated. The remaining subset is presented in Tables 5-8.
An in-house developed Microsoft Excel macro was utilized for comparison of the remaining protein lists against previously characterized in-house proteomes of the culture medium (CM) from 44 cancer cell lines and three near normal cell lines, and 11 relevant biological fluids [22-33, our unpublished data]. Proteomes were characterized using multi-dimensional liquid chromatography tandem mass spectrometry on a linear ion trap (LTQ) Orbitrap mass spectrometer. For details, see our previous publications [22-33]. The cancer cell lines were from six cancer types (breast, colon, lung, ovarian, pancreatic, and prostate cancer). The relevant biological fluids included amniotic fluid (normal, with Down Syndrome), nipple aspirate fluid, non-malignant peritoneal fluid, ovarian ascites, pancreatic ascites, pancreatic juice, pancreas tissue (normal and malignant), and seminal plasma.
Data of proteomes from the CM of 23 cancer cell lines (from 11 cancer types) was also integrated, as recently published by Wu et al. [52]. Proteomes were characterized using one-dimensional SDS-PAGE and nano-liquid chromatography tandem mass spectrometry on a LTQ-Orbitrap mass spectrometer. The 11 cancer types include breast, bladder, cervical, colorectal, epidermoid, liver, lung, nasopharyngeal, oral, and pancreatic cancer, and T cell lymphoma [52]. If a protein was identified in a proteomic dataset, the proteome in which it was identified in, was noted.
Both CA19.9 and CUZD1 were quantified in serum with commercially available ELISA kits (Roche and USCN, respectively) as per the manufacturer's recommendations.
Validation of CA19.9 (for comparison) and CUZD1 was performed using 20 benign, pancreatic cyst serum samples and 20 pancreatic cancer samples of mixed stages. At a cutoff of 37 IU/mL, CA19.9 showed 70% specificity and 80% sensitivity (identified six false positives, shown in Table 9 in squares, four false negatives, shown in circles). At a cutoff of 3.1 ng/mL, CUZD1 showed 85% specificity and 85% sensitivity (identified three false positives, shown in Table 9 in squares, and three false negatives, shown in circles). CUZD1 had a similar area under the curve (AUC) value to CA19.9 (Table 10,
In the previous dataset (Example 3), CA19-9 and CUZD-1 performed very similarly (slightly better for CA19-9) for the discrimination between benign and cancer patients. Next, it was decided to test the performance of these two markers in a bigger dataset which consisted of 50 normal, 50 benign (chronic pancreatitis, pancreatic cyst, PD dilation) and 50 cancer (of unknown stage) serum samples.
The scatter plot analysis for CUZD1 and CA19-9 (
Differential label-free semi-quantitative proteomining of pancreatic adenocarcinoma (PDAC) tissues and their adjacent benign tissues is a convenient approach for biomarker discovery. Herein, it is performed offline multi-dimensional chromatography/Orbitrap® mass spectrometry proteomic analysis of four PDAC tissues and their closest benign tissues to identify 2190 non-redundant proteins. 16 potential candidates using a systematic scoring algorithm were segregated, based on pancreatic cancer-specific mRNA overexpression, identification in malignant ascitic fluid, PDAC-label free quantitative value and cellular localization.
The preliminary serological verification of the top four candidates, DSP, LAMC2, GP73 and DSG2 in 20 patients diagnosed with pancreatic cancer and 20 with benign pancreatic cyst showed a significant (p<0.05) elevation for LAMC2 and DSG2 in pancreatic cancer serum. To validate the initial findings, it was decided to test the performance of these two markers in a bigger dataset which consisted of 50 normal, 50 benign (chronic pancreatitis, pancreatic cyst, PD dilation) and 50 cancer (of unknown stage) serum samples. Based on these initial results we decided to not analyze DSG2 further.
The scatter plot analyses for LAMC2, DSG2 and CA19-9 (
Given the results from the dataset of example 4, consisting of 50 benign and 50 cancer (mixed stage) serum samples, it was desired to investigate whether CUZD1 is also elevated in earlier stages of pancreatic cancer (stages I and II). To assess the performance of CUZD1 in early stages the serum levels of CUZD1 was measured in a second sample dataset which consisted of 50 normal, 50 benign, 50 cancer/stage II and 50 cancer/stage IV samples. CUZD1 was significantly elevated in the serum of pancreatic cancer patients even at stage II. Again a significant complementarity was seen when the two markers were used simultaneously.
In the sample set, levels of CUZD1 were significantly elevated in patients with stage II and stage IV PDAC compared to patients with benign disease (stage II PDAC: median 2.83 ng/mL, IQR 1.43-7.42, P<0.0001; stage IV PDAC: median 3.46, IQR 1.40-11.48, P<0.0001), as were levels of CA19-9. ROC curve analysis (
In the blinded sample set from Pittsburgh, Pa., USA, serum levels of CUZD1 were similar in patients with benign disease and healthy controls (P=0.2961). Levels of CUZD1 were significantly elevated in patients with stage IIB PDAC compared to patients with benign disease (stage IIB PDAC median 5.93 ng/mL, IQR 2.85-14.47; P=0.0321;). Levels of CUZD1 were also significantly elevated in patients with stage IV PDAC compared to those with stage IIB PDAC (stage IV PDAC median 54.40 ng/mL, IQR 20.33-79.02; P=0.0002;),
Pancreatic cancer (pancreatic ductal adenocarcinoma, PDAC) is the tenth most commonly diagnosed cancer but it ranks fourth in cancer-related deaths in North America101, 102. In contrast to other major human malignancies (lung, breast, colon and prostate) which have shown notable reductions in mortality rate, attributed to earlier diagnosis and advancements in management and treatment, pancreatic cancer has had minimal improvement in patients' survival rate over the past 30 years101.
At the time of diagnosis, approximately 80% of patients demonstrate aggressive and metastatic tumours which are not suitable for surgical resection103. The 5-year survival rate improves from 2% to 23% if the disease is diagnosed at its localized stage compared to a distant metastatic stage104. Failure in therapeutic response in advanced disease is mainly attributed to the intense stromal effect in pancreatic cancer105, 106 and randomized clinical trials have suggested that adjuvant chemotherapy significantly enhance survival rates of patients who undergone surgical resection107, 108, emphasizing the importance of early detection of the disease. The late presentation of disease-specific symptoms often leads to missed or delayed diagnosis of pancreatic cancer patients and hence decreased survival rates, emphasizing on the urgent clinical need to detect pancreatic cancer early before its progression to an advanced stage.
In terms of diagnosis, sensitive or specific screening tests for early detection of pancreatic cancer would be useful. Conventional imaging tools include computerized tomography (CT) scanning, magnetic resonance imaging (MRI), endoscopic ultrasonography (EUS), and endoscopic retrograde cholangiopancreatography (ERCP), which are powerful in tumour staging and confirming a suspected pancreatic mass109, 110, are relatively costly, time-consuming and invasive. In the contrary, serum biomarkers have low cost and they are easily accessible, they remain to be an ideal way for early diagnosis111. The current gold-standard serum biomarker CA19.9 is used in the clinic mainly for disease monitoring and prognosis102, 112, 113. CA19.9 has limited sensitivity in pancreatic cancer detection due to its absence in Lewisa-b- individuals (5-10% of Caucasian population) even in advanced disease stage, as well as it is barely detectable in early premalignant disease. CA19.9 is not a specific marker because of its elevation in other benign conditions and multiple cancer types. Taken together, it is critical to discover novel biomarkers to complement CA19.9 in order to improve both its sensitivity and specificity.
Using tissue proteomics116 and bioinformatics approaches117 CUB and zona pellucida-like domains 1 (CUZD1) and laminin, gamma C2 (LAMC2) respectively have been identified, which were recently discovered and validated as described above using three large independent sample sets with a total of 425 samples116, 119.
Prior to our discovery and validation studies, there are very limited studies done on both of these markers in pancreatic cancer. In our validation results, CUZD1 and LAMC2 have demonstrated robust diagnostic performances in distinguishing pancreatic cancer from benign disease and they appear to have significant complementarity with CA19.9116, 119 A large blinded validation study of these markers using 400 patient plasma samples to evaluate their individual performances as well as their performance in a panel to complement CA19.9 in diagnosing early pancreatic cancer patients is described.
Patients and control subjects were recruited on a consecutive basis from participating investigators in two major hospitals.
Subjects with a histologically confirmed or CT scan confirmed diagnosis of PDAC or with an abnormal abdominal imaging study (CT, MRI, MRCP and EUS) were eligible for the study. Control subjects with a clinical diagnosis of a pancreas, liver or intestinal condition, or being evaluated for non-pancreatic malignancies were included in the study. Subjects under the age of 18 years old and those without informed consent were excluded. Any patients with a prior history of any other malignancy except non-melanoma skin cancers for ten years were not included. Healthy controls were eligible volunteers without any of the pancreatic conditions or malignant diseases. A subset of patients was selected from the available subject pool based on desired characteristics (retrospective sample collection-prospective patient recruitment).
A total of 400 blinded plasma samples were obtained comprising of a training set (n=186) and an independent validation set (n=214). Overall, the 400 samples comprised of 20 healthy individuals, 130 benign condition patients, 51 stage IA, IB, 150 stage IIB and 49 stage IV pancreatic cancer patients. Details about sample population are shown in (Table 11). All samples were collected prior to any treatment following informed consent with an Institutional Review Board approved protocol.
Blood was collected in ACD (anticoagulant) vacutainer tubes and plasma samples were processed within 24 hours of blood draw. Blood samples were centrifuged at room temperature for 10 minutes (at 1000×g) to pellet the cells. Right after the centrifugation, the plasma samples were aliquoted into 1 mL cryotubes stored in −80° C. until analysis.
Using commercially available sandwich enzyme-linked immunosorbent assays (ELISA) kits for, CUZD1 and LAMC2 purchased from USCN Life Sciences (Missouri City, Tex., USA), the levels of these proteins were measured in duplicates according to the manufacturer's protocols. CA19.9 levels were measured using the Abbott Architect XR CA19.9 ELISA immunoassay.
Prior to all validation assays, CUZD1 and LAMC2 ELISAs were first tested to optimize the analytical performances, to select appropriate internal controls (low, medium and high) and the dilution factor for each of the ELISA kits. Internal controls were used to assess the inter-plate variability.
Samples were diluted in assay buffer diluent as follows: 1 in 5 dilution for CUZD1 and 1 in 100 dilution for LAMC2. 100 uL of diluted sample was incubated in pre-coated ELISA 96-well plates along with standards for 2 hours in 37° C. After washing the strips, 100 uL of biotin-labeled polyclonal secondary antibody (detection reagent A) was added and incubated for another hour in 37° C. After washing, 100 uL of avidin-conjugated horseradish peroxidase (detection reagent B) was added and incubated for 30 minutes at 37° C. After a final washing step, 90 uL of tetramethylbenzidine (TMB) substrate was added to each well and incubated for approximately 10-15 minutes in the dark at 37° C. until the second lowest standard could be distinguished from the blank by a change of colour. 50 uL of stopping solution (sulphuric acid solution) was then added and the absorbance was measured using the Perkin-Elmer Envision 2103 Multilabel Reader at 450 nm wavelength standardized with a background absorbance at 540 nm.
The validation study was conducted according to the “Standards for the reporting of diagnostic accuracy studies (STARD) initiative”120 (Table 15). Table 15 depicts an overall summary of the performance of CUZD1 and LAMC2 in comparison to CA19-9 in healthy, benign and cancer population.
Comparisons of levels of markers between groups was performed using the Mann Whitney-Wilcoxon test. Mean level comparisons were performed using a t-test and/or an ANOVA test.
Discriminative ability of biomarkers was assessed by building receiver operating characteristic curves (ROC) for individual markers and combined predictors. The diagnostic value of the markers was evaluated based on area under the curve (AUC) calculations and evaluation sensitivity at predetermined specificity thresholds of 80% and 90%. Confidence intervals (95%) for areas under the curve and p-value for comparison between two correlated ROC curves were performed using the method described by DeLong130. An optimized cutoff for each marker was obtained by minimizing the total prediction error, by the following formula: √{square root over ((1−sensitivity)2+(1−specificity)2)}{square root over ((1−sensitivity)2+(1−specificity)2)}.
Multi-parametric models for combinations of markers were constructed by fitting logistic regression models using the marker concentrations as predictors. The estimated coefficients of the model were used to construct a combined score for each observation which was then used for the evaluation of the multi-parametric model. The resulting 3 linear models evaluated for diagnostic performance are: (1) CA19.9+11.84·CUZD1, (2) CA19.9+0.202·LAMC2, (3) CA19.9+12.41·CUZD1+0.14·LAMC2.
Statistical analysis in the training set was performed while being blinded to clinical annotations of the validation set. After multi-parametric prediction models were build based on the training set samples, clinical information for validation samples were unblinded and model prediction were evaluated. Hypothesis testing was two-tailed, and p-values of less than 0.05 were considered as significant. Statistical analysis was performed in the R environment (version 2.15.2) available from http://www.R-project.org. ROC curve analysis and comparisons between ROC curves was performed using the pROC package121.
Prior to all validation assays, CUZD1 and LAMC2 ELISAs were first tested to optimize the analytical performances, to select appropriate internal controls (low, medium and high) and the dilution factor for each of the ELISA kits. Inter-plate assay imprecision was assessed across the 12 plates used for each marker using three internal controls (low, medium and high) (table 12). The coefficient of variation (CV) was calculated for each marker (Table 12). Overall, CUZD1 and LAMC2 assays demonstrated acceptable reproducibility across 12 plates, with <20% CVs in all three internal controls. As an additional quality control step, all samples were analyzed in duplicate to assess the intra-plate variations. The mean and median CV amongst duplicates samples ranged from 5% to 12% for all markers, which is indicative of good intra-plate performance of the assays.
All samples (n=400) were analyzed using ELISA assays on the same day for each candidate. Researchers I.P. and A.C. performed this step while being blinded to the clinical information of each sample.
As individual markers, the performances of the candidates were compared to CA19.9 in discriminating benign patients versus PDAC patients in both training and validation cohorts (
CA19.9 is not a reliable biomarker test in detecting early stage pancreatic cancer patients. The diagnostic ability CUZD1 and LAMC2 in complementing CA19.9 in early stages of pancreatic cancer patients (stages IA, IB and IIA), at which point the tumours are still generally resectable. Given that chronic pancreatitis often shows elevated level of CA19.9, CA19.9 lacks specificity in differentiating inflammatory from malignant masses, resulting in important therapeutic implications such as unnecessary surgery and undetected pancreatic malignancy. Therefore, the differential diagnostic accuracy of CUZD1 and LAMC2 was also assessed in chronic pancreatitis versus early PDAC patients.
Multi-parametric modeling for the combination of CA19.9, CUZD1 and LAMC2 as a two or three markers panel was constructed based on the training set and applied to the blinded validation set. ROC curves showed the performances of three models established in the training and validated sets respectively (
Performances of Candidates in PDAC Patients with CA19.9 Values Below 37 IU/mL
At its clinical cutoff value of 37 IU/mL, for diagnosing positive pancreatic cancer patients, CA19.9 has a reported sensitivity of 79-81% and specificity of 82-90%2. Consequently, many PDAC cases are missed by CA19.9. The levels of CUZD1 and LAMC2 specifically in PDAC cases that had CA19.9 level <37 IU/mL were evaluated. Out of 250 PDAC cases in both training and validation sets, 75 PDAC cases (approximately 30%) had CA19.9 levels <37 IU/mL. In CA19.9-negative PDAC patients, CUZD1 and LAMC2 retained significant diagnostic ability to capture differentiate PDAC from benign conditions in both training and validation cohorts (p<0.05; Table 14). Notably, both CUZD1 and LAMC2 also had significant differential diagnosis of resectable CA19.9 negative PDAC patients (Table 14), demonstrating potential for complementarity for CA19.9.
The levels of CUZD1 and LAMC2 were evaluated specifically in PDAC cases that had CA19.9 level <37 IU/mL. Out of 250 PDAC cases in both training and validation sets, 75 PDAC cases (approximately 30%) had CA19.9 levels <37 IU/mL. In CA19.9-negative PDAC patients, CUZD1 and LAMC2 retained significant diagnostic ability to capture differentiate PDAC from benign conditions in both training and validation cohorts (p<0.05). Notably, both CUZD1 and LAMC2 also had significant differential diagnosis of resectable CA19.9 negative PDAC patients, demonstrating potential for complementarity for CA19.9.
A plethora of high-throughput discovery studies result in generation of thousands of potential diagnostic candidates, however, subsequent verification and validation studies are lacking in the biomarker field122. As a result, true biomarkers remained masked123. To the best of our knowledge, there is currently no marker that can substitute CA19.9 in the clinic. CA19.9 is elevated in benign conditions and cancer types and can be undetectable in early resectable PDAC patients.
The present study is an extensive blinded validation and examines the diagnostic ability of CUZD1 and LAMC2 in complementing CA19.9 for example for detecting early stage PDAC patients, as well as differentiating between patients with benign conditions and PDAC patients. To avoid possible biases, we conducted our validation study according to the “Standards for the reporting of diagnostic accuracy studies (STARD) initiative”120 (Table 15)3 CUZD1 and LAMC2 showed consistent and robust diagnostic performance throughout validation studies described in other Examples (n=425 samples)116, 119 and retained good diagnostic performances in the current 400 blinded sample set. CUZD1 and LAMC2 demonstrated strong diagnostic ability individually, they retained diagnostic accuracy in CA19.9 negative PDAC cases, and multi-parametric models demonstrated remarkable complementarity of CUZD1 and LAMC2 with CA19.9, especially in the detection of early stage PDAC (stages IA, IB, IIA and IIB) from benign conditions.
Recent research has suggested that it takes up to a decade before the initial tumour acquires metastatic ability, offering a long window of opportunity for early detection of pancreatic cancer124, 125. Considering that no single marker possesses sufficient sensitivity and specificity for early diagnosis of pancreatic cancer, research interest has been shifted into the development of biomarker panels111, 126, 127. A biomarker panel consisting of CA19.9, CUZD1 and LAMC2 can achieve better diagnostic performance in detecting PDAC patients than CA19.9 alone. This improvement is most notable at early disease stages when the disease may be treatable.
Monitoring pancreatic cancer patients is challenging. The only currently used marker is (CA19-9). Notably, almost 10% of the general population is genetically negative to CA19-9. Therefore, there is a need to identify novel markers that can complement CA19-9 as monitoring markers of the disease. Based on the data disclosed herein with CUZD1 and LAMC2, both marker could also be used as a monitoring marker for pancreatic cancer. Serum and tumour samples are currently being collecting from patients prior to surgery and/or during cycles of post-surgery chemotherapeutic treatments. Samples will be assessed for CUZD1 and compared to earlier and later obtained samples and correlated with disease progression. Prospective collection of serum from pancreatic cancer patients will follow.
Five highly colon-specific proteins were identified in the bioinformatics strategy to identify candidate biomarkers for colon cancer. In particular, the proteins: CLCA1 (HGNC—2015, Entrez Gene—1179, OMIM—603906), GPA33 (HGNC—4445, Entrez Gene—10223 OMIM—602171), LEFTY1 (HGNC—6552, Entrez Gene—10637, OMIM—603037), ZG16 (Entrez—16p11.2, HGNC—16p11.2) and CEACAM7 (HGNC—18191, Entrez Gene—10872, Ensembl_ENSG000000073067, UniProtKB_Q140023) seem to fulfill the identified criteria that could characterize a promising biomarker candidate. Their expression is highly restricted to the colon, they are secreted or membrane-bound proteins and they have never been tested before as colon cancer serum markers. Serum samples are being collected from colon cancer patients in order to obtain an assessment of their performance in diagnosing colon cancer. Based on our results in-house immunoassays (ELISAs) will be made.
aAll proteins identified in ≧1 database; the number of total proteins identified with ≧2 databases is enclosed in brackets
bPertains to proteins identified using a Secretome Algorithm
aTissue-specific proteins as it applies to this table indicates protein expression was manually verified in BioGPS and/or HPA databases. For database full names see “Non-Standard Abbreviations”
aCM (conditioned media) proteome of colon cancer cell lines [Karagiannis G et al., unpublished].
aProteome of normal and cancer pancreas tissue [Kosanam H et al., unpublished].
aProteome of normal and cancer pancreas tissue [Kosanam H et al., unpublished].
bCM Proteome of breast cancer cell lines [Pavlou M et al., unpublished].
aCM proteome from prostate cancer cell line [Saraon P et al., unpublished].
bProteome of normal and cancer pancreas tissue [Kosanam H et al., unpublished].
Samples characterized by Acute pancreatitis, Chronic pancreatitis, CBD stones and Other benign conditions are identified as being “Benign”; Samples characterized by PDAC, stage IA, IB, IIA are identified as being “Resectable”; Samples characterized by PDAC, stage IIB are identified as “Maybe resectable”; Samples characterized as PDAC, stage IV are identified as “Non-resectable”.
Concentrations (ng/mL) prior to correcting for dilution factor are listed for all five candidates. Blank cells were not shown.
While the present application has been described with reference to what are presently considered to be the preferred examples, it is to be understood that the application is not limited to the disclosed examples. To the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
All publications, patents and patent applications as well as sequences corresponding to the accession numbers listed in the Tables, are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent, patent application or sequence was specifically and individually indicated to be incorporated by reference in its entirety. Specifically, the sequence associated with each accession number provided herein is incorporated by reference in its entirely.
This application is a PCT application which claims priority from U.S. provisional 61/611,955 filed March 16, which is herein incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2013/000248 | 3/15/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61611955 | Mar 2012 | US |