Gene expression profiling of colon cancer with DNA arrays

SEQUENCE LISTING

The instant application contains a “lengthy” Sequence Listing which has been submitted via CD-R in lieu of a printed paper copy, and is hereby incorporated by reference in its entirety. Said CD-R, recorded on May 5, 2005, are labeled CRF, “Copy 1” and “Copy 2”, respectively, and each contains only one identical 3.63 Mb file NAMED 1423R03.APP.

FIELD OF THE INVENTION

The present invention relates to polynucleotide analysis and, in particular, to polynucleotide expression profiling of colorectal carcinomas using arrays of polynucleotides.

BACKGROUND

Colorectal carcinoma (CRC) is a frequent and deadly disease. Different groups of tumors have been defined according to aggressiveness, anatomical localization and putative genetic instability based on conventional histopathological and immunohistopathological analysis. However, these aforementioned diagnostic tools are not sufficient to accurately diagnose and predict survival. Gene expression microarrays improve these classifications and bring new insights on the underlying molecular mechanisms involved throughout colorectal tumorigenic progression.

Despite global scientific efforts to effectively treat colon cancer, little progress has been made during the last decade and colorectal cancer (CRC) remains one of the most frequent and deadly neoplasias in western countries. Current prognostic models based on histoclinical parameters inadequately describe the heterogeneity of CRC, and are not sufficient to predict prognosis and guide clinical treatment in the individual patients. Tumors with different genetic alteration with similar clinical presentation follow different evolutions. One goal of molecular analysis is to identify, among complex networks of genes involved in tumorigenic progression, markers that could differentiate subgroups of tumors with prognosis, hence providing physicians with a clinically useful diagnostic tool to treat individual patients based on molecular gene sets as previously described.

Previous studies have been largely focused on individual candidate genes of disease, contrasting with the molecular complexity of cancer. The multi-step progression of CRC is accompanied by a number of genetic alterations [KRAS, APC, P53 and mismatch repair (MMR) genes, WNT and TGF-alpha pathways] that accumulate and interact in heterogenous complex ways to exert their tumor promoting effects (Vogelstein, 1988; Fearon, 1990). Despite the large number of published studies, the clinical utility of these disparate observations and reports remain limited for CRC patients. For example, little is known about molecular alterations associated with the prognostic heterogeneity of disease or the microsatellite instability (MSI) phenotype, and no single molecular marker has been validated to accurately predict prognososis in clinical practice. New models based on a precise molecular understanding of disease are required to improve screening, diagnosis,treatment, and ultimately survival of patients.

DNA microarray technology allows the measure of the mRNA expression level of thousands of genes simultaneously in a single assay, thus providing a molecular definition of a sample adapted to address the combinatory and complex nature of cancers (Bertucci, 2001; Ramaswamy, 2002; Mohr, 2002). Gene expression profiling may reveal biologically and/or clinically relevant subgroups of tumors (Alizadeh, 2000; Garber, 2001; Kihara, 2001; Beer, 2002; Bertucci, 2002; Devilard, 2002; Singh, 2002) and significantly improve current mechanistic understanding of oncogenesis.

Gene expression profiling-based studies of CRC have so far compared normal to tumor tissue samples, or described the molecular heterogeniety in different stages of colorectal disease (Alon, 1999; Notterman, 2001; Lin, 2002; Backert, 1999; Zou, 2002; Agrawal, 2002; Kitahara, 2001; Williams, 2003; Tureci, 2003; Birkenkamp-Demtroder, 2002; Frederiksen, 2003), but none have directly addressed the issue of prognosis or MSI phenotype.

SUMMARY OF THE INVENTION

DNA microarrays may be utilized to elucidate discrete gene sets to improve the prognostic classification of CRC, identify novel potential therapeutic targets of carcinogenesis, describe new diagnostic and/or prognostic markers, and guide physician decisions on appropriate patient care.

The invention thus provides a method for analyzing differential gene expression associated with histopathologic features of colorectal disease, comprising the detection of the overexpression or underexpression of a pool of polynucleotide sequences in colon tissues, said pool comprising all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets I through 644 set forth in Table 1.

The invention further provides a method or prognosis or diagnosis of colon cancer, or for monitoring the treatment of a subject with a colon cancer. This method comprises the steps of 1) obtaining colon tissue nucleic acids from a patient; and 2) detecting the overexpression or underexpression of a pool of polynucleotide sequences in colon tissues. The pool of polynuclestide sequences comprises all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequnce sets 1 through 644, as set forth in Table 1.

The invention further provides a polynucleotide library, comprising a pool of polynucleotide sequences either overexpressed or underexpressed in colon tissue, said pool corresponding to all or part of the polynucleotide sequences of SEQ ID Nos. 1 through 1596.

The invention still further provides a method of detecting differential gene expression, comprises 1) obtaining a polynucleotide sample from a subject; 2) reacting said polynucleotide sample obtained in step (1) with a polynucleotide library of the invention; and 3) detecting the reaction product of step (2).

The invention still further provides a method of assigning a therapeutic regimen to subject with histopathological features of colorectal disease, comprising 1) classifying the subject as having a “poor prognosis” or a “good prognosis” on the basis of the method of differential gene expression analysis according to the invention, and 2) assigning the subject a therapeutic regimen. The therapeutic regimen will either (i) comprise no adjuvant chemotherapy if the subject is lymph node negative and is classified as having a good prognosis, or (ii) comprise chemotherapy if said patient has any other combination of lymph node status and expression profile.

BRIEF DESCRIPTION OF THE FIGURES

FIGS. 1A-1C show global gene expression profiles in colorectal cancer and non-cancerous samples.

FIGS. 2A-2B show hierarchical classifications of tissue samples using genes which discriminate between normal and cancer samples.

FIGS. 3A-3C show hierarchical classifications of CRC tissue samples using genes that discriminate metastatic from non-metastatic samples, correlated with survival.

FIGS. 4A-4C show hierarchical classifications of CRC tissue samples using discriminator genes selected by supervised analyses based on lymph node status, MSI phenotype and location of tumors.

FIGS. 5A-5C show the analysis of NM23 protein expression in colorectal tissue samples using tissue microarrays.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to DNA array, technology which can be used to analyse the expression of numerous (e.g., ˜8,000) genes in cancerous and non-cancerous colon tissue or cell samples. Unsupervised hierarchical clustering can be used to identify putative gene expression patterns that are precisely correlated to subgroups of tumors; and these sub-groups are notably correlated to patient prognosis, disease aggressiveness, and survival. Supervised analysis can be used to identify several genes differentially expressed between normal and cancer samples, and delineated subgroups of colon cancer can be defined by histoclinical parameters, including clinical outcome (i.e., 5-year survival of 100% in a group and 40% in the other group, p<0.005), lymph node invasion, tumors from the right or left colon, and MSI phenotype. Discriminator genes are associated with various cellular processes. The most significant discriminatory genes and/or potential markers identified by the present invention were further validated at the protein level using immunohistochemistry (IHC) on sections of tissue microarrays (TMA) on 190 tumor and normal samples (see Examples below).

The invention thus provides a method for analyzing differential gene expression associated with histopathologic features of colorectal disease, e.g., colon tumors, in particular colon cancer. The method of the invention comprises the detection of the overexpression or underexpression of a pool of polynucleotide sequences in colon tissues. The pool of polynucleotide sequences corresponds to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequences sets set forth in Table 1 below.

TABLE 1GeneSetsymbolNo.ImageNameSeq3′Seq5′RefCAPG11012666capping protein (actin filament),SEQ ID No: 1SEQ ID No: 2gelsolin-likeDEK21016390dek oncogene (dna binding)SEQ ID No: 3SEQ ID No: 4DVL131030065dishevelled, dsh homolog 1 (drosophila)SEQ ID No: 5SEQ ID No: 6NOV41046837nephroblastoma overexpressed geneSEQ ID No: 7SEQ ID No: 8CD79A51056782cd79a antigen (immunoglobulin-SEQ ID No: 9SEQ ID No: 10associated alpha)MGC270766108249hypothetical protein mgc27076SEQ ID No: 11SEQ ID No: 12SEQ ID No: 137108274SEQ ID No: 148108292SEQ ID No: 15C1ORF289108305chromosome 1 open reading frame 28SEQ ID No: 16SEQ ID No: 17SEQ ID No: 18MAP2K210108370mitogen-activated protein kinase kinase 2SEQ ID No: 19SEQ ID No: 20SEQ ID No: 21LOC22011511108374hypothetical protein loc220115SEQ ID No: 2212108399SEQ ID No: 23HRB13108490hiv-1 rev binding proteinSEQ ID No: 24SEQ ID No: 2514110385hypothetical gene supported bySEQ ID No: 26SEQ ID No: 27ak026041LOC9290615110486hypothetical protein bc008217SEQ ID No: 28SEQ ID No: 29SEQ ID No: 30SOX416111461sry (sex determining region y)-box 4SEQ ID No: 31SEQ ID No: 32SEQ ID No: 33GSTA217113932glutathione s-transferase a2SEQ ID No: 34SEQ ID No: 35SEQ ID No: 36MLLT3181144752myeloid/lymphoid or mixed-lineageSEQ ID No: 37SEQ ID No: 38leukemia (trithorax homolog,drosophila); translocated to, 3TCF319114639transcription factor 3 (e2aSEQ ID No: 39SEQ ID No: 40SEQ ID No: 41immunoglobulin enhancer bindingfactors e12/e47)PMS220116906pms2 postmeiotic segregation increasedSEQ ID No: 42SEQ ID No: 43SEQ ID No: 442 (s. cerevisiae)LPP21117240lim domain containing preferredSEQ ID No: 45SEQ ID No: 46SEQ ID No: 47translocation partner in lipomaPTPRC22117755protein tyrosine phosphatase, receptorSEQ ID No: 48SEQ ID No: 49type, c23117811similar to [human ig rearranged gammaSEQ ID No: 50SEQ ID No: 51chain mrna, v-j-c region and completecds.], gene productC6ORF53241184178chromosome 6 open reading frame 53SEQ ID No: 52SEQ ID No: 53PDPK12511856503-phosphoinositide dependent proteinSEQ ID No: 54SEQ ID No: 55kinase-126118634similar to [human ig rearranged gammaSEQ ID No: 56SEQ ID No: 57chain mrna, v-j-c region and completecds.], gene productKCNJ1527119530potassium inwardly-rectifying channel,SEQ ID No: 58SEQ ID No: 59SEQ ID No: 60subfamily j, member 1528119772loc284066SEQ IDNo: 61USP9X29120009ubiquitin specific protease 9, xSEQ ID No: 62SEQ ID No: 63SEQ ID No: 64chromosome (fat facets-like drosophila)HELZ30120572helicase with zinc finger domainSEQ ID No: 65SEQ ID No: 66ADD131120783adducin 1 (alpha)SEQ ID No: 67SEQ ID No: 68ATP5L32121076atp synthase, h+ transporting,SEQ ID No: 69SEQ ID No: 70mitochondrial f0 complex, subunit gIFNAR133121265interferon (alpha, beta and omega)SEQ ID No: 71SEQ ID No: 72SEQ ID No: 73receptor 1ELAVL134121366elav (embryonic lethal, abnormalSEQ ID No: 74SEQ ID No: 75vision, drosophila)-like 1 (hu antigen r)35122004loc143724SEQ ID No: 76DSG136122743desmoglein 1SEQ ID No: 77SEQ ID No: 78SEQ ID No: 79OLFM137122756olfactomedin 1SEQ ID No: 80SEQ ID No: 81C338123379complement component 3SEQ ID No: 82SEQ ID No: 83C4BPA39123664complement component 4 bindingSEQ ID No: 84SEQ ID No: 85SEQ ID No: 86protein, alphaDMPK40123916dystrophia myotonica-protein kinaseSEQ ID No: 87SEQ ID No: 88SEQ ID No: 89RPL641123948ribosomal protein 16SEQ ID No: 90SEQ ID No: 91SEQ ID No: 92HLA-DQB142123953major histocompatibility complex, classSEQ ID No: 93SEQ ID No: 94SEQ ID No: 95ii, dq beta 1CENPF43124345centromere protein f, 350/400 kaSEQ ID No: 96SEQ ID No: 97SEQ ID No: 98(mitosin)CSF144124554colony stimulating factor 1SEQ ID No: 99SEQ ID No: 100(macrophage)NDST345125806n-deacetylase/n-sulfotransferaseSEQ ID No: 101SEQ ID No: 102SEQ ID No: 103(heparan glucosaminyl) 3SPI146127394spleen focus forming virus (sffv)SEQ ID No: 104SEQ ID No: 105SEQ ID No: 106proviral integration oncogene spi1ATP5C147127950atp synthase, h+ transporting,SEQ ID No: 107SEQ ID No: 108SEQ ID No: 109mitochondrial f1 complex, gammapolypeptide 1TNFSF1048128413tumor necrosis factor (ligand)SEQ ID No: 110SEQ ID No: 111SEQ ID No: 112superfamily, member 10ASBABP249129112aspecific bcl2 are-binding protein 2SEQ ID No: 113SEQ ID No: 114COX7A2L50129146cytochrome c oxidase subunit viiaSEQ ID No: 115SEQ ID No: 116SEQ ID No: 117polypeptide 2 likeXTP551129227minor histocompatibility antigen ha-8SEQ ID No: 118SEQ ID No: 119SEQ ID No: 120GATA352129757gata binding protein 3SEQ ID No: 121SEQ ID No: 122STK653129865serine/threonine kinase 6SEQ ID No: 123SEQ ID No: 124FLJ1429754130173hypothetical protein flj14297SEQ ID No: 125SEQ ID No: 126SEQ ID No: 127HEYL55132307hairy/enhancer-of-split related withSEQ ID No: 128SEQ ID No: 129SEQ ID No: 130yrpw motif-likeCD2561326652cd2 antigen (p50), sheep red blood cellSEQ ID No: 131SEQ ID No: 132receptorGRF257133334guanine nucleotide-releasing factor 2SEQ ID No: 133SEQ ID No: 134(specific for crk proto-oncogene)ITGAL581338831integrin, alpha 1 (antigen cd11a (p180),SEQ ID No: 135SEQ ID No: 136lymphocyte function-associated antigen1; alpha polypeptide)SPIB591350545spi-b transcription factor (spi-1/pu.1SEQ ID No: 137SEQ ID No: 138related)S100P60135221s100 calcium binding protein pSEQ ID No: 139SEQ ID No: 140SEQ ID No: 141PVRL361135302poliovirus receptor-related 3SEQ ID No: 142SEQ ID No: 143SEQ ID No: 14462136361SEQ ID No: 145SEQ ID No: 146COX6A163139069cytochrome c oxidase subunit viaSEQ ID No: 147SEQ ID No: 148SEQ ID No: 149polypeptide 1IL2RB64139073interleukin 2 receptor, betaSEQ ID No: 150SEQ ID No: 151SEQ ID No: 152CDK2651391584cyclin-dependent kinase 2SEQ ID No: 153SEQ ID No: 154GPR166139304g protein-coupled receptor 1SEQ ID No: 155SEQ ID No: 156SEQ ID No: 157PSG667139392pregnancy specific beta-1-glycoprotein 6SEQ ID No: 158SEQ ID No: 159SEQ ID No: 160EPS1568139789epidermal growth factor receptorSEQ ID No: 161SEQ ID No: 162SEQ ID No: 163pathway substrate 15APRT69141998adenine phosphoribosyltransferaseSEQ ID No: 164SEQ ID No: 165SEQ ID No: 166TGFB1I1701423050transforming growth factor beta 1SEQ ID No: 167SEQ ID No: 168induced transcript 1FKBP271143519fk506 binding protein 2, 13 kdaSEQ ID No: 169SEQ ID No: 170SEQ ID No: 17172144853SEQ ID No: 172BLVRA73145269biliverdin reductase aSEQ ID No: 173SEQ ID No: 174SEQ ID No: 175SLC30A574145286solute carrier family 30 (zincSEQ ID No: 176SEQ ID No: 177SEQ ID No: 178transporter), member 5AZGP1751456160alpha-2-glycoprotein 1, zincSEQ ID No: 179SEQ ID No: 180761456315homo sapiens cdna flj30452 fis, cloneSEQ ID No: 181brace2009293.KLRD177145696killer cell lectin-like receptor subfamilySEQ ID No: 182SEQ ID No: 183d, member 1FOLR278146494folate receptor 2 (fetal)SEQ ID No: 184SEQ ID No: 185SEQ ID No: 18679146922SEQ ID No: 187SEQ ID No: 188PTGS280147050prostaglandin-endoperoxide synthase 2SEQ ID No: 189SEQ ID No: 190SEQ ID No: 191(prostaglandin g/h synthase andcyclooxygenase)PECAM181147341platelet/endothelial cell adhesionSEQ ID No: 192SEQ ID No: 193molecule (cd31 antigen)PSEN182147495presenilin 1 (alzheimer disease 3)SEQ ID No: 194SEQ ID No: 195SEQ ID No: 196831493187homo sapiens, clone image: 4831215,SEQ ID No: 197mrnaGATA284149809gata binding protein 2SEQ ID No: 198SEQ ID No: 199SEQ ID No: 200CHST13851500894carbohydrate (chondroitin 4)SEQ ID No: 201SEQ ID No: 202sulfotransferase 13IGF1R86150361insulin-like growth factor 1 receptorSEQ ID No: 203SEQ ID No: 204SEQ ID No: 205SOCS287150644suppressor of cytokine signaling 2SEQ ID No: 206SEQ ID No: 207SEQ ID No: 208INSR88151149insulin receptorSEQ ID No: 209SEQ ID No: 210TFDP189151495transcription factor dp-1SEQ ID No: 211SEQ ID No: 212SEQ ID No: 213IL10RA90151740interleukin 10 receptor, alphaSEQ ID No: 214SEQ ID No: 215SEQ ID No: 216LYK591152467protein kinase lyk5SEQ ID No: 217SEQ ID No: 218SEQ ID No: 219MYBL1921526789v-myb myeloblastosis viral oncogeneSEQ ID No: 220homolog (avian)-like 1LIF93153025leukemia inhibitory factor (cholinergicSEQ ID No: 221SEQ ID No: 222SEQ ID No: 223differentiation factor)EIF4G394153141eukaryotic translation initiation factor 4SEQ ID No: 224SEQ ID No: 225SEQ ID No: 226gamma, 3TGFB1I195153461transforming growth factor beta 1SEQ ID No: 227SEQ ID No: 228SEQ ID No: 168induced transcript 1TJP396153474tight junction protein 3 (zona occludensSEQ ID No: 229SEQ ID No: 230SEQ ID No: 2313)STC197153589stanniocalcin 1SEQ ID No: 232SEQ ID No: 233SEQ ID No: 234DES98153854desminSEQ ID No: 235SEQ ID No: 236SEQ ID No: 237FCGBP99154172fc fragment of igg binding proteinSEQ ID No: 238SEQ ID No: 239PMSCL2100154335polymyositis/scleroderma autoantigenSEQ ID No: 240SEQ ID No: 241SEQ ID No: 2422, 100 kdaPLCD1101154600phospholipase c, delta 1SEQ ID No: 243SEQ ID No: 244SEQ ID No: 245CRIP1102155219cysteine-rich protein 1 (intestinal)SEQ ID No: 246SEQ ID No: 247BCKDK103155774branched chain alpha-ketoacidSEQ ID No: 248SEQ ID No: 249SEQ ID No: 250dehydrogenase kinaseTCF3104156505transcription factor 3 (e2aSEQ ID No: 251SEQ ID No: 41immunoglobulin enhancer bindingfactors e12/e47)ZNF463105156718zinc finger protein 463SEQ ID No: 252SEQ ID No: 253MCP106158233membrane cofactor protein (cd46,SEQ ID No: 254SEQ ID No: 255SEQ ID No: 256trophoblast-lymphocyte cross-reactiveantigen)LTBP4107158239latent transforming growth factor betaSEQ ID No: 257SEQ ID No: 258SEQ ID No: 259binding protein 4MEIS11081591384meis1, myeloid ecotropic viralSEQ ID No: 260SEQ ID No: 261integration site 1 homolog (mouse)ACE109159885angiotensin i converting enzymeSEQ ID No: 262SEQ ID No: 263(peptidyl-dipeptidase a) 1CD3E110159903cd3e antigen, epsilon polypeptide (tit3SEQ ID No: 264SEQ ID No: 265complex)MGC39325111165818hypothetical protein mgc39325SEQ ID No: 266SEQ ID No: 267SEQ ID No: 268PRKACA112166052protein kinase, camp-dependent,SEQ ID No: 269SEQ ID No: 270catalytic, alphaSERPINB51131662274serine (or cysteine) proteinase inhibitor,SEQ ID No: 271SEQ ID No: 272clade b (ovalbumin), member 5HSF41141667886heat shock transcription factor 4SEQ ID No: 273SEQ ID No: 274DOK21151671188docking protein 2, 56 kdaSEQ ID No: 275SEQ ID No: 276EEF1A11161683100eukaryotic translation elongation factorSEQ ID No: 277SEQ ID No: 2781 alpha 1S100A121171705397s100 calcium binding protein a12SEQ ID No: 279SEQ ID No: 280(calgranulin c)CAMK2B118172444calcium/calmodulin-dependent proteinSEQ ID No: 281SEQ ID No: 282SEQ ID No: 283kinase (cam kinase) ii betaPLCG21191731982phospholipase c, gamma 2SEQ ID No: 284SEQ ID No: 285(phosphatidylinositol-specific)NME1120174388non-metastatic cells 1, protein (nm23a)SEQ ID No: 286SEQ ID No: 287SEQ ID No: 288expressed inPTGDS121178305prostaglandin d2 synthase 21 kda (brain)SEQ ID No: 289SEQ ID No: 290SEQ ID No: 291PP122179232pyrophosphatase (inorganic)SEQ ID No: 292SEQ ID No: 293PPP2R2C123179264protein phosphatase 2 (formerly 2a),SEQ ID No: 294regulatory subunit b (pr 52), gammaisoform124179776SEQ ID No: 295125181827SEQ ID No: 296TP531261847162tumor protein p53 (li-fraumeniSEQ ID No: 297SEQ ID No: 298syndrome)DARS127186331aspartyl-trna synthetaseSEQ ID No: 299SEQ ID No: 300SEQ ID No: 301EGF1281869652epidermal growth factor (beta-SEQ ID No: 302SEQ ID No: 303urogastrone)RPL29P2129190103ribosomal protein 129 pseudogene 2SEQ ID No: 304SEQ ID No: 305EEF1B21301902297eukaryotic translation elongation factorSEQ ID No: 306SEQ ID No: 3071 beta 2STK61311912132serine/threonine kinase 6SEQ ID No: 308SEQ ID No: 124TAL1132191548t-cell acute lymphocytic leukemia 1SEQ ID No: 309RPS15A133191714ribosomal protein s15aSEQ ID No: 310SEQ ID No: 311RPS19134192242ribosomal protein s19SEQ ID No: 312SEQ ID No: 313HRD1135192515hrd1 proteinSEQ ID No: 314SEQ ID No: 315PTPN21136192581protein tyrosine phosphatase, non-SEQ ID No: 316SEQ ID No: 317receptor type 21NDUFA4137193672nadh dehydrogenase (ubiquinone) 1SEQ ID No: 318SEQ ID No: 319SEQ ID No: 320alpha subcomplex, 4, 9 kdaTSG101138194350tumor susceptibility gene 101SEQ ID No: 321SEQ ID No: 322SEQ ID No: 323SDHD139195013succinate dehydrogenase complex,SEQ ID No: 324SEQ ID No: 325SEQ ID No: 326subunit d, integral membrane proteinDAP3140195702death associated protein 3SEQ ID No: 327SEQ ID No: 328SEQ ID No: 329BTF3141195889basic transcription factor 3SEQ ID No: 330SEQ ID No: 331BUB3142198903bub3 budding uninhibited bySEQ ID No: 332SEQ ID No: 333SEQ ID No: 334benzimidazoles 3 homolog (yeast)143199837homo sapiens transcribed sequence withSEQ ID No: 335strong similarity to protein sp: p08865(h. sapiens) rsp4_human 40s ribosomalprotein sa (p40) (34/67 kda lamininreceptor) (colon carcinoma laminin-binding protein) (nem/1chd4)OAS11442005212′,5′-oligoadenylate synthetase 1,SEQ ID No: 336SEQ ID No: 337SEQ ID No: 33840/46 kdaCD209L145200714cd209 antigen-likeSEQ ID No: 339SEQ ID No: 340SEQ ID No: 341FGB146201352fibrinogen, b beta polypeptideSEQ ID No: 342SEQ ID No: 343MYL1147201925myosin, light polypeptide 1, alkali;SEQ ID No: 344SEQ ID No: 345SEQ ID No: 346skeletal, fastPRPF4B148202609prp4 pre-mrna processing factor 4SEQ ID No: 347SEQ ID No: 348SEQ ID No: 349homolog b (yeast)ARGBP2149203264arg/abl-interacting protein argbp2SEQ ID No: 350SEQ ID No: 351SEQ ID No: 352RFC4150203275replication factor c (activator 1) 4,SEQ ID No: 353SEQ ID No: 354SEQ ID No: 35537 kdaCSF1R151204653colony stimulating factor 1 receptor,SEQ ID No: 356SEQ ID No: 357SEQ ID No: 358formerly mcdonough feline sarcomaviral (v-fms) oncogene homolog152204740SEQ ID No: 3591532048801homo sapiens mrna full length insertSEQ ID No: 360cdna clone euroimage 1630957TP53154205314tumor protein p53 (li-fraumeniSEQ ID No: 361SEQ ID No: 298syndrome)LRP21552055272low density lipoprotein-related protein 2SEQ ID No: 362SEQ ID No: 363SP110156205612sp110 nuclear body proteinSEQ ID No: 364SEQ ID No: 365SEQ ID No: 366CCNF157206323cyclin fSEQ ID No: 367SEQ ID No: 368CAPN12158206522calpain 12SEQ ID No: 369SEQ ID No: 370GRB141592067776growth factor receptor-bound protein 14SEQ ID No: 371SEQ ID No: 372DDX24160207491dead (asp-glu-ala-asp) box polypeptideSEQ ID No: 373SEQ ID No: 374SEQ ID No: 37524161208357SEQ ID No: 376SEQ ID No: 377HPN162208413hepsin (transmembrane protease, serineSEQ ID No: 378SEQ ID No: 379SEQ ID No: 3801)MGP163209710matrix gla proteinSEQ ID No: 381SEQ ID No: 3821642106469similar to riken cdna 4933405110SEQ ID No: 383EPB41L4B165210698erythrocyte membrane protein band 4.1SEQ ID No: 384SEQ ID No: 385SEQ ID No: 386like 4bRPS4X166211433ribosomal protein s4, x-linkedSEQ ID No: 387SEQ ID No: 388IGF2167211445insulin-like growth factor 2SEQ ID No: 389SEQ ID No: 390(somatomedin a)UBA52168211920ubiquitin a-52 residue ribosomal proteinSEQ ID No: 391SEQ ID No: 392SEQ ID No: 393fusion product 1AKR1C3169211995aldo-keto reductase family 1, memberSEQ ID No: 394SEQ ID No: 395c3 (3-alpha hydroxysteroiddehydrogenase, type ii)RARB170212414retinoic acid receptor, betaSEQ ID No: 396SEQ ID No: 397SEQ ID No: 398MGLL17121626monoglyceride lipaseSEQ ID No: 399SEQ ID No: 400CRK17222295v-crk sarcoma virus ct10 oncogeneSEQ ID No: 401SEQ ID No: 402homolog (avian)LAMA31732266576laminin, alpha 3SEQ ID No: 403SEQ ID No: 404ZDHHC11742272404zinc finger, dhhc domain containing 1SEQ ID No: 405SEQ ID No: 406BCL2175232714b-cell cll/lymphoma 2SEQ ID No: 407SEQ ID No: 408VPREB31762349125pre-b lymphocyte gene 3SEQ ID No: 409SEQ ID No: 410PFC177235934properdin p factor, complementSEQ ID No: 411SEQ ID No: 412SEQ ID No: 413BAK1178235938bcl2-antagonist/killer 1SEQ ID No: 414SEQ ID No: 415SEQ ID No: 416MGC13071179236008hypothetical protein mgc13071SEQ ID No: 417SEQ ID No: 418SEQ ID No: 419TP53180236338tumor protein p53 (li-fraumeniSEQ ID No: 420SEQ ID No: 421SEQ ID No: 298syndrome)CAPN218123643calpain 2, (m/ii) large subunitSEQ ID No: 422SEQ ID No: 423SEQ ID No: 424ARAF118223692v-raf murine sarcoma 3611 viralSEQ ID No: 425SEQ ID No: 426SEQ ID No: 427oncogene homolog 1QDPR18323776quinoid dihydropteridine reductaseSEQ ID No: 428SEQ ID No: 429SEQ ID No: 430SLC12A2184238612solute carrier family 12SEQ ID No: 431SEQ ID No: 432SEQ ID No: 433(sodium/potassium/chloridetransporters), member 2MGC5395185238840hypothetical protein mgc5395SEQ ID No: 434SEQ ID No: 435SEQ ID No: 436GCSH186239937glycine cleavage system protein hSEQ ID No: 437SEQ ID No: 438(aminomethyl carrier)EPHB218724067ephb2SEQ ID No: 439SEQ ID No: 440188240753SEQ ID No: 441SEQ ID No: 442TPP218924085tripeptidyl peptidase iiSEQ ID No: 443SEQ ID No: 444SEQ ID No: 445TPP2190241151tripeptidyl peptidase iiSEQ ID No: 446SEQ ID No: 447SEQ ID No: 445IQGAP119124125iq motif containing gtpase activatingSEQ ID No: 448SEQ ID No: 449SEQ ID No: 450protein 1FGB192241788fibrinogen, b beta polypeptideSEQ ID No: 451SEQ ID No: 452SEQ ID No: 343FGA193244810fibrinogen, a alpha polypeptideSEQ ID No: 453SEQ ID No: 454CTSS194245614cathepsin sSEQ ID No: 455SEQ ID No: 456SEQ ID No: 457FAM3A19524609family with sequence similarity 3,SEQ ID No: 458SEQ ID No: 459SEQ ID No: 460member aGSN196246170gelsolin (amyloidosis, finnish type)SEQ ID No: 461SEQ ID No: 462SEQ ID No: 463IDE197246290insulin-degrading enzymeSEQ ID No: 464SEQ ID No: 465ADH4198246860alcohol dehydrogenase 4 (class ii), piSEQ ID No: 466SEQ ID No: 467SEQ ID No: 468polypeptideDSC2199247055desmocollin 2SEQ ID No: 469SEQ ID No: 470SEQ ID No: 471K-ALPHA-1200247905tubulin, alpha, ubiquitousSEQ ID No: 472SEQ ID No: 473ATP6V1H201247909atpase, h+ transporting, lysosomalSEQ ID No: 474SEQ ID No: 47550/57 kda, v1 subunit hCOX5B202248263cytochrome c oxidase subunit vbSEQ ID No: 476SEQ ID No: 477SEQ ID No: 478DLK1203248701delta-like 1 homolog (drosophila)SEQ ID No: 479SEQ ID No: 480CNTN120424884contactin 1SEQ ID No: 481SEQ ID No: 482SEQ ID No: 483CDC42205251772cell division cycle 42 (gtp bindingSEQ ID No: 484SEQ ID No: 485protein, 25 kda)SCO120625222sco cytochrome oxidase deficientSEQ ID No: 486SEQ ID No: 487homolog 1 (yeast)LOC5105820725285hypothetical protein loc51058SEQ ID No: 488SEQ ID No: 489RALB20825392v-ral simian leukemia viral oncogeneSEQ ID No: 490SEQ ID No: 491SEQ ID No: 492homolog b (ras related; gtp bindingprotein)RPL3209254505ribosomal protein 13SEQ ID No: 493SEQ ID No: 494SLPI210255348secretory leukocyte protease inhibitorSEQ ID No: 495SEQ ID No: 496(antileukoproteinase)HIPK3211256846homeodomain interacting protein kinase 3SEQ ID No: 497SEQ ID No: 498SEQ ID No: 499NIT1212257170nitrilase 1SEQ ID No: 500SEQ ID No: 501SEQ ID No: 502RPL39213257284ribosomal protein 139SEQ ID No: 503SEQ ID No: 504UCHL3214257445ubiquitin carboxyl-terminal esterase 13SEQ ID No: 505SEQ ID No: 506SEQ ID No: 507(ubiquitin thiolesterase)MAD215257519max dimerization protein 1SEQ ID No: 508SEQ ID No: 509DUSP1216257708dual specificity phosphatase 1SEQ ID No: 510SEQ ID No: 511COX7B217258313cytochrome c oxidase subunit viibSEQ ID No: 512SEQ ID No: 513KRT6B21825831keratin 6bSEQ ID No: 514SEQ ID No: 515SEQ ID No: 516CYP19A1219258870cytochrome p450, family 19, subfamilySEQ ID No: 517SEQ ID No: 518SEQ ID No: 519a, polypeptide 1HPSE220260138heparanaseSEQ ID No: 520SEQ ID No: 521SEQ ID No: 522CTCF22126029ccctc-binding factor (zinc fingerSEQ ID No: 523SEQ ID No: 524SEQ ID No: 525protein)HMGA2222261204high mobility group at-hook 2SEQ ID No: 526SEQ ID No: 527CTSB223261517cathepsin bSEQ ID No: 528SEQ ID No: 529GK224262425glycerol kinaseSEQ ID No: 530SEQ ID No: 531IL6ST225263262interleukin 6 signal transducer (gp 130,SEQ ID No: 532SEQ ID No: 533oncostatin m receptor)C5ORF5226264183chromosome 5 open reading frame 5SEQ ID No: 534SEQ ID No: 535SEQ ID No: 536LOC57209227264186kruppel-type zinc finger proteinSEQ ID No: 537SEQ ID No: 538CRYAB228264331crystallin, alpha bSEQ ID No: 539SEQ ID No: 540SEQ ID No: 541MGC985022926584hypothetical protein mgc9850SEQ ID No: 542SEQ ID No: 543CCT423026710chaperonin containing tcpl, subunit 4SEQ ID No: 544SEQ ID No: 545SEQ ID No: 546(delta)LIAS231267123lipoic acid synthetaseSEQ ID No: 547SEQ ID No: 548SEQ ID No: 549HMGB2232267145high-mobility group box 2SEQ ID No: 550SEQ ID No: 551SEQ ID No: 552MAGEH1233267657apr-1 proteinSEQ ID No: 553SEQ ID No: 554SEQ ID No: 555MADH1234268150mad, mothers against decapentaplegicSEQ ID No: 556SEQ ID No: 557SEQ ID No: 558homolog 1 (drosophila)ACADVL235269388acyl-coenzyme a dehydrogenase, verySEQ ID No: 559SEQ ID No: 560long chainRENT123626945regulator of nonsense transcripts 1SEQ ID No: 561SEQ ID No: 562SEQ ID No: 563PWP123726964nuclear phosphoprotein similar toSEQ ID No: 564SEQ ID No: 565SEQ ID No: 566s. cerevisiae pwp1PTD004238270794hypothetical protein ptd004SEQ ID No: 567SEQ ID No: 568SEQ ID No: 56923927100SEQ ID No: 570SEQ ID No: 571ASNS24027208asparagine synthetaseSEQ ID No: 572SEQ ID No: 573SEQ ID No: 574NRAS241272189neuroblastoma ras viral (v-ras)SEQ ID No: 575SEQ ID No: 576SEQ ID No: 577oncogene homologMORF4L124227237mortality factor 4 like 1SEQ ID No: 578SEQ ID No: 579CCT4243272502chaperonin containing tcp1, subunit 4SEQ ID No: 580SEQ ID No: 546(delta)WBSCR2224427326williams beuren syndrome chromosomeSEQ ID No: 581SEQ ID No: 582SEQ ID No: 583region 22GNS245274315glucosamine (n-acetyl)-6-sulfataseSEQ ID No: 584SEQ ID No: 585SEQ ID No: 586(sanfilippo disease iiid)SLC17A724627506solute carrier family 17 (sodium-SEQ ID No: 587SEQ ID No: 588dependent inorganic phosphatecotransporter), member 7ARHT224727599ras homolog gene family, member t2SEQ ID No: 589SEQ ID No: 590SEQ ID No: 591TP53BP2248277339tumor protein p53 binding protein, 2SEQ ID No: 592SEQ ID No: 593SEQ ID No: 594CCBL1249277740cysteine conjugate-beta lyase;SEQ ID No: 595SEQ ID No: 596SEQ ID No: 597cytoplasmic (glutamine transaminase k,kyneurenine aminotransferase)ID42502783684inhibitor of dna binding 4, dominantSEQ ID No: 598SEQ ID No: 599SEQ ID No: 600negative helix-loop-helix proteinTUBE1251279460tubulin, epsilon 1SEQ ID No: 601SEQ ID No: 602SEQ ID No: 603MPDZ25228019multiple pdz domain proteinSEQ ID No: 604SEQ ID No: 605SEQ ID No: 606CACNA1I253283375calcium channel, voltage-dependent,SEQ ID No: 607SEQ ID No: 608SEQ ID No: 609alpha 1i subunitGFER254283601growth factor, augmenter of liverSEQ ID No: 610SEQ ID No: 611SEQ ID No: 612regeneration (erv1 homolog, s. cerevisiaeSNRPB2255284256small nuclear ribonucleoproteinSEQ ID No: 613SEQ ID No: 614polypeptide b″CHI3L2256284640chitinase 3-like 2SEQ ID No: 615SEQ ID No: 616ABCA8257284828atp-binding cassette, sub-family aSEQ ID No: 617SEQ ID No: 618(abc1), member 8BTBD125828577btb (poz) domain containing 1SEQ ID No: 619SEQ ID No: 620SEQ ID No: 621MMP13259285780matrix metalloproteinase 13SEQ ID No: 622SEQ ID No: 623(collagenase 3)GART26028596phosphoribosylglycinamideSEQ ID No: 624SEQ ID No: 625SEQ ID No: 626formyltransferase,phosphoribosylglycinamide synthetase,phosphoribosylaminoimidazolesynthetaseCUL2261286287cullin 2SEQ ID No: 627SEQ ID No: 628GRM3262287843glutamate receptor, metabotropic 3SEQ ID No: 629SEQ ID No: 630CA7263288874carbonic anhydrase viiSEQ ID No: 631SEQ ID No: 632SEQ ID No: 633PNMT264289857phenylethanolamine n-SEQ ID No: 634SEQ ID No: 635methyltransferaseSILV265291448silver homolog (mouse)SEQ ID No: 636SEQ ID No: 637SEQ ID No: 638ANK1266292321ankyrin 1, erythrocyticSEQ ID No: 639SEQ ID No: 640SEQ ID No: 641XRCC126729451x-ray repair complementing defectiveSEQ ID No: 642SEQ ID No: 643SEQ ID No: 644repair in chinese hamster cells 1CSE1L26829933cse1 chromosome segregation 1-likeSEQ ID No: 645SEQ ID No: 646SEQ ID No: 647(yeast)DXS1283E269300163gs2 geneSEQ ID No: 648SEQ ID No: 649TAF1027030066taf10 rna polymerase ii, tata boxSEQ ID No: 650SEQ ID No: 651binding protein (tbp)-associated factor,30 kdaCKMT2271301119creatine kinase, mitochondrial 2SEQ ID No: 652SEQ ID No: 653SEQ ID No: 654(sarcomeric)TNNC1272301128troponin c, slowSEQ ID No: 655SEQ ID No: 656DKFZP434J0617273301258hypothetical protein dkfzp434j0617SEQ ID No: 657274302310homo sapiens cdna flj36340 fis, cloneSEQ ID No: 658SEQ ID No: 659thymu2006468.GUK1275302453guanylate kinase 1SEQ ID No: 660SEQ ID No: 661HSPA9B276305045heat shock 70 kda protein 9b (mortalin-SEQ ID No: 662SEQ ID No: 663SEQ ID No: 6642)NDUFA6277306510nadh dehydrogenase (ubiquinone) 1SEQ ID No: 665SEQ ID No: 666SEQ ID No: 667alpha subcomplex, 6, 14 kdaIFNGR2278306555interferon gamma receptor 2 (interferonSEQ ID No: 668SEQ ID No: 669SEQ ID No: 670gamma transducer 1)HRIHFB2206279306697hrihfb2206 proteinSEQ ID No: 671SEQ ID No: 672GCAT280307094glycine c-acetyltransferase (2-amino-3-SEQ ID No: 673SEQ ID No: 674SEQ ID No: 675ketobutyrate coenzyme a ligase)CD9281307352cd9 antigen (p24)SEQ ID No: 676SEQ ID No: 677SEQ ID No: 678ESD282310057esterase d/formylglutathione hydrolaseSEQ ID No: 679SEQ ID No: 680ZNF183283310088zinc finger protein 183 (ring finger,SEQ ID No: 681SEQ ID No: 682SEQ ID No: 683c3hc4 type)HSPA828431027heat shock 70 kda protein 8SEQ ID No: 684SEQ ID No: 685SEQ ID No: 686RPL35285310774ribosomal protein 135SEQ ID No: 687SEQ ID No: 688SEQ ID No: 689NUDT5286310860nudix (nucleoside diphosphate linkedSEQ ID No: 690SEQ ID No: 691SEQ ID No: 692moiety x)-type motif 5PFDN4287320143prefoldin 4SEQ ID No: 693SEQ ID No: 694SEQ ID No: 695RPL37288320151ribosomal protein 137SEQ ID No: 696SEQ ID No: 697SEQ ID No: 698SPR289320457sepiapterin reductase (7,8-SEQ ID No: 699SEQ ID No: 700SEQ ID No: 701dihydrobiopterin:nadp +oxidoreductase)LOC56267290320775hypothetical protein 669SEQ ID No: 702SEQ ID No: 703SEQ ID No: 704RPL31291321259ribosomal protein 131SEQ ID No: 705SEQ ID No: 706SEQ ID No: 707SRP72292321510signal recognition particle 72 kdaSEQ ID No: 708SEQ ID No: 709SEQ ID No: 710RPS6293321733ribosomal protein s6SEQ ID No: 711SEQ ID No: 712SEQ ID No: 713PHKG1294321783phosphorylase kinase, gamma 1SEQ ID No: 714SEQ ID No: 715SEQ ID No: 716(muscle)TACSTD1295321907tumor-associated calcium signalSEQ ID No: 717SEQ ID No: 718SEQ ID No: 719transducer 1RPS27L296321973ribosomal protein s27-likeSEQ ID No: 720SEQ ID No: 721SEQ ID No: 722297321981loc151103SEQ ID No: 723SEQ ID No: 724CHGA298322452chromogranin a (parathyroid secretorySEQ ID No: 725SEQ ID No: 726SEQ ID No: 727protein 1)SNRPC299322471small nuclear ribonucleoproteinSEQ ID No: 728SEQ ID No: 729SEQ ID No: 730polypeptide cAIP300322495aryl hydrocarbon receptor interactingSEQ ID No: 731SEQ ID No: 732SEQ ID No: 733proteinIRF1301323001interferon regulatory factor 1SEQ ID No: 734SEQ ID No: 735SEQ ID No: 736COX7A2302323650cytochrome c oxidase subunit viiaSEQ ID No: 737SEQ ID No: 738SEQ ID No: 739polypeptide 2 (liver)LOC51255303323681hypothetical protein loc51255SEQ ID No: 740SEQ ID No: 741SEQ ID No: 742COPZ2304323753coatomer protein complex, subunit zeta 2SEQ ID No: 743SEQ ID No: 744SEQ ID No: 745CKAP1305323766cytoskeleton-associated protein 1SEQ ID No: 746SEQ ID No: 747RPS3A306323863ribosomal protein s3aSEQ ID No: 748SEQ ID No: 749SEQ ID No: 750SOX9307323948sry (sex determining region y)-box 9SEQ ID No: 751SEQ ID No: 752(campomelic dysplasia, autosomal sex-reversal)DSCR1308324006down syndrome critical region gene 1SEQ ID No: 753SEQ ID No: 754SEQ ID No: 755KRAS2309324257v-ki-ras2 kirsten rat sarcoma 2 viralSEQ ID No: 756SEQ ID No: 757SEQ ID No: 758oncogene homologCTBS310324369chitobiase, di-n-acetyl-SEQ ID No: 759SEQ ID No: 760PPP1R15A311324684protein phosphatase 1, regulatorySEQ ID No: 761SEQ ID No: 762SEQ ID No: 763(inhibitor) subunit 15aRPS15A312324757ribosomal protein s15aSEQ ID No: 764SEQ ID No: 765SEQ ID No: 311SAT313324930spermidine/spermine n1-SEQ ID No: 766SEQ ID No: 767SEQ ID No: 768acetyltransferaseGRSF1314325058g-rich rna sequence binding factor 1SEQ ID No: 769SEQ ID No: 770SEQ ID No: 771PSG5315325641pregnancy specific beta-1-glycoprotein 5SEQ ID No: 772SEQ ID No: 773SEQ ID No: 774STMN431632698stathmin-like 4SEQ ID No: 775SEQ ID No: 776SEQ ID No: 777CDH15317327684cadherin 15, m-cadherin (myotubule)SEQ ID No: 778SEQ ID No: 779SEQ ID No: 780NDUFA4318327740nadh dehydrogenase (ubiquinone) 1SEQ ID No: 781SEQ ID No: 782SEQ ID No: 320alpha subcomplex, 4, 9 kdaRAN319328245ran, member ras oncogene familySEQ ID No: 783SEQ ID No: 784SEQ ID No: 785PNLIPRP1320328591pancreatic lipase-related protein 1SEQ ID No: 786SEQ ID No: 787SEQ ID No: 788CAP232133005cap, adenylate cyclase-associatedSEQ ID No: 789SEQ ID No: 790SEQ ID No: 791protein, 2 (yeast)NDFIP232233722nedd4 family interacting protein 2SEQ ID No: 792ATP5C132333794atp synthase, h+ transporting,SEQ ID No: 793SEQ ID No: 794SEQ ID No: 109mitochondrial f1 complex, gammapolypeptide 1ATP7A324340995atpase, cu++ transporting, alphaSEQ ID No: 795SEQ ID No: 796SEQ ID No: 797polypeptide (menkes syndrome)ATP6V0B325341121atpase, h+ transporting, lysosomalSEQ ID No: 798SEQ ID No: 799SEQ ID No: 80021 kda, v0 subunit c″DAD1326341699defender against cell death 1SEQ ID No: 801SEQ ID No: 802SEQ ID No: 803327341834loc349507SEQ ID No: 804SEQ ID No: 805328341984SEQ ID No: 806SEQ ID No: 807CXORF6329342054chromosome x open reading frame 6SEQ ID No: 808SEQ ID No: 809SEQ ID No: 810B2M330342416beta-2-microglobulinSEQ ID No: 811SEQ ID No: 812SEQ ID No: 813CLIC533134260chloride intracellular channel 5SEQ ID No: 814SEQ ID No: 815SEQ ID No: 816NDN332343578necdin homolog (mouse)SEQ ID No: 817SEQ ID No: 818SEQ ID No: 819OSBPL1A333344037oxysterol binding protein-like 1aSEQ ID No: 820SEQ ID No: 821SEQ ID No: 822COL6A1334344326collagen, type vi, alpha 1SEQ ID No: 823SEQ ID No: 824SEQ ID No: 825MRPS23335344792mitochondrial ribosomal protein s23SEQ ID No: 826SEQ ID No: 827SEQ ID No: 828PIK3CA336345430phosphoinositide-3-kinase, catalytic,SEQ ID No: 829SEQ ID No: 830SEQ ID No: 831alpha polypeptideC6ORF9337345437chromosome 6 open reading frame 9SEQ ID No: 832SEQ ID No: 833SEQ ID No: 834FLJ20813338345648hypothetical protein flj20813SEQ ID No: 835SEQ ID No: 836SEQ ID No: 837RPS21339345676ribosomal protein s21SEQ ID No: 838SEQ ID No: 839SEQ ID No: 840340345694SEQ ID No: 841SEQ ID No: 842CA3341345706carbonic anhydrase iii, muscle specificSEQ ID No: 843SEQ ID No: 844SEQ ID No: 845P4HA1342346016procollagen-proline, 2-oxoglutarate 4-SEQ ID No: 846SEQ ID No: 847SEQ ID No: 848dioxygenase (proline 4-hydroxylase),alpha polypeptide iCOL6A2343346269collagen, type vi, alpha 2SEQ ID No: 849SEQ ID No: 850SEQ ID No: 851SFN344346610StratifinSEQ ID No: 852SEQ ID No: 853SEQ ID No: 854TCEB1345347373transcription elongation factor b (siii),SEQ ID No: 855SEQ ID No: 856SEQ ID No: 857polypeptide 1 (15 kda, elongin c)RELN34634888ReelinSEQ ID No: 858SEQ ID No: 859SEQ ID No: 860SKP1A34734917s-phase kinase-associated protein 1aSEQ ID No: 861SEQ ID No: 862SEQ ID No: 863(p19a)AQP134835072aquaporin 1 (channel-forming integralSEQ ID No: 864SEQ ID No: 865SEQ ID No: 866protein, 28 kda)IRF234935262interferon regulatory factor 2SEQ ID No: 867SEQ ID No: 868SEQ ID No: 869NGB35035483NeuroglobinSEQ ID No: 870SEQ ID No: 871SEQ ID No: 872TM4SF5351356783transmembrane 4 superfamily member 5SEQ ID No: 873SEQ ID No: 874SEQ ID No: 875TGFB3352356980transforming growth factor, beta 3SEQ ID No: 876SEQ ID No: 877SEQ ID No: 878RPA3353357239replication protein a3, 14 kdaSEQ ID No: 879SEQ ID No: 880SEQ ID No: 881SEMA3C354357820sema domain, immunoglobulin domainSEQ ID No: 882SEQ ID No: 883SEQ ID No: 884(ig), short basic domain, secreted,(semaphorin) 3cCNOT2355357893ccr4-not transcription complex, subunit 2SEQ ID No: 885SEQ ID No: 886CDW52356358041cdw52 antigen (campath-1 antigen)SEQ ID No: 887SEQ ID No: 888SEQ ID No: 889SOX9357358117sry (sex determining region y)-box 9SEQ ID No: 890SEQ ID No: 891SEQ ID No: 752(campomelic dysplasia, autosomal sex-reversal)HSU79266358358162protein predicted by clone 23627SEQ ID No: 892SEQ ID No: 893SEQ ID No: 894PFDN2359358267prefoldin 2SEQ ID No: 895SEQ ID No: 896SEQ ID No: 897TPM1360358683tropomyosin 1 (alpha)SEQ ID No: 898SEQ ID No: 899SEQ ID No: 900FLJ21272361358943hypothetical protein flj21272SEQ ID No: 901SEQ ID No: 902SEQ ID No: 903PSMC2362358993proteasome (prosome, macropain) 26sSEQ ID No: 904SEQ ID No: 905subunit, atpase, 2CKS2363359119cdc28 protein kinase regulatory subunit 2SEQ ID No: 906SEQ ID No: 907NDUFA9364359147nadh dehydrogenase (ubiquinone) 1SEQ ID No: 908SEQ ID No: 909alpha subcomplex, 9, 39 kdaH11365359191protein kinase h11SEQ ID No: 910SEQ ID No: 911CA4366359250carbonic anhydrase ivSEQ ID No: 912SEQ ID No: 913SEQ ID No: 914PRSS3367359254protease, serine, 3 (mesotrypsin)SEQ ID No: 915SEQ ID No: 916SEQ ID No: 917368360588homo sapiens transcribed sequence withSEQ ID No: 918moderate similarity to proteinref: np_036199.1 (h. sapiens) aldo-ketoreductase family 7, member a3(aflatoxin aldehyde reductase) [homosapiens]HIG1369361108likely ortholog of mouse hypoxiaSEQ ID No: 919SEQ ID No: 920SEQ ID No: 921induced gene 1370363273SEQ ID No: 922SEQ ID No: 923ADD1371363991adducin 1 (alpha)SEQ ID No: 924SEQ ID No: 925SEQ ID No: 68LAMB1372364012laminin, beta 1SEQ ID No: 926SEQ ID No: 927SEQ ID No: 928CD5373364687cd5 antigen (p56-62)SEQ ID No: 929SEQ ID No: 930SEQ ID No: 931UQCR37436607ubiquinol-cytochrome c reductaseSEQ ID No: 932SEQ ID No: 933SEQ ID No: 934(6.4 kd) subunitRAP2A37536684rap2a, member of ras oncogene familySEQ ID No: 935SEQ ID No: 936SEQ ID No: 937RGS637636710regulator of g-protein signalling 6SEQ ID No: 938SEQ ID No: 939SEQ ID No: 940IL1RN37736844interleukin 1 receptor antagonistSEQ ID No: 941SEQ ID No: 942SEQ ID No: 943LRP137837345low density lipoprotein-related proteinSEQ ID No: 944SEQ ID No: 945SEQ ID No: 9461 (alpha-2-macroglobulin receptor)DJ1042K10.237937496hypothetical protein dj1042k10.2SEQ ID No: 947SEQ ID No: 948SEQ ID No: 949PTPRN238037506protein tyrosine phosphatase, receptorSEQ ID No: 950SEQ ID No: 951SEQ ID No: 952type, n polypeptide 2CCNB2381375781cyclin b2SEQ ID No: 953SEQ ID No: 954SEQ ID No: 955TCTEL1382376284t-complex-associated-testis-expressedSEQ ID No: 956SEQ ID No: 957SEQ ID No: 9581-like 1TUBB38337630tubulin, beta polypeptideSEQ ID No: 959SEQ ID No: 960RHEB384376473ras homolog enriched in brainSEQ ID No: 961SEQ ID No: 962SEQ ID No: 963VCP385376547valosin-containing proteinSEQ ID No: 964SEQ ID No: 965IL2RB386376696interleukin 2 receptor, betaSEQ ID No: 966SEQ ID No: 967SEQ ID No: 152TAZ387376755transcriptional co-activator with pdz-SEQ ID No: 968SEQ ID No: 969SEQ ID No: 970binding motif (taz)HSPC150388376769hspc150 protein similar to ubiquitin-SEQ ID No: 971SEQ ID No: 972SEQ ID No: 973conjugating enzymePLCD4389376802phospholipase c, delta 4SEQ ID No: 974SEQ ID No: 975SEQ ID No: 976NR2F6390377020nuclear receptor subfamily 2, group f,SEQ ID No: 977SEQ ID No: 978member 6MTPN391377545MyotrophinSEQ ID No: 979SEQ ID No: 980SLPI392378813secretory leukocyte protease inhibitorSEQ ID No: 981SEQ ID No: 496(antileukoproteinase)KPNA139338056karyopherin alpha 1 (importin alpha 5)SEQ ID No: 982SEQ ID No: 983SEQ ID No: 984LAMR1394383433laminin receptor 1 (ribosomal proteinSEQ ID No: 985SEQ ID No: 986SEQ ID No: 987sa, 67 kda)SST39539593SomatostatinSEQ ID No: 988SEQ ID No: 989ABCA539639821atp-binding cassette, sub-family aSEQ ID No: 990SEQ ID No: 991SEQ ID No: 992(abc1), member 5NME139739961non-metastatic cells 1, protein (nm23a)SEQ ID No: 993SEQ ID No: 994SEQ ID No: 288expressed inADAM2339839972a disintegrin and metalloproteinaseSEQ ID No: 995SEQ ID No: 996SEQ ID No: 997domain 23CYCS39940017cytochrome c, somaticSEQ ID No: 998SEQ ID No: 999SEQ ID No: 1000GCNIL140040567gcn1 general control of amino-acidSEQ ID No: 1001SEQ ID No: 1002synthesis 1-like 1 (yeast)RBBP140140721retinoblastoma binding protein 1SEQ ID No: 1003SEQ ID No: 1004SEQ ID No: 1005CNN340241099calponin 3, acidicSEQ ID No: 1006SEQ ID No: 1007SEQ ID No: 1008RPL2440341411ribosomal protein 124SEQ ID No: 1009SEQ ID No: 1010SEQ ID No: 1011SAT40441452spermidine/spermine n1-SEQ ID No: 1012SEQ ID No: 1013SEQ ID No: 768acetyltransferaseSNRPE405415389small nuclear ribonucleoproteinSEQ ID No: 1014SEQ ID No: 1015SEQ ID No: 1016polypeptide eARG1406416060arginase, liverSEQ ID No: 1017SEQ ID No: 1018SEQ ID No: 1019IL13RA240741648interleukin 13 receptor, alpha 2SEQ ID No: 1020SEQ ID No: 1021SEQ ID No: 1022TXN408416946ThioredoxinSEQ ID No: 1023SEQ ID No: 1024SEQ ID No: 1025TFR2409417861transferrin receptor 2SEQ ID No: 1026SEQ ID No: 1027SEQ ID No: 1028NUTF241041857nuclear transport factor 2SEQ ID No: 1029SEQ ID No: 1030P2RX441142118purinergic receptor p2x, ligand-gatedSEQ ID No: 1031SEQ ID No: 1032SEQ ID No: 1033ion channel, 4SYK41242214spleen tyrosine kinaseSEQ ID No: 1034SEQ ID No: 1035SEQ ID No: 1036GPC6413427858glypican 6SEQ ID No: 1037SEQ ID No: 1038SEQ ID No: 1039CD1C414428103cd1c antigen, c polypeptideSEQ ID No: 1040SEQ ID No: 1041SEQ ID No: 1042CYCS415429544cytochrome c, somaticSEQ ID No: 1043SEQ ID No: 1044SEQ ID No: 1000TNFRSF7416430090tumor necrosis factor receptorSEQ ID No: 1045SEQ ID No: 1046SEQ ID No: 1047superfamily, member 741743207homo sapiens transcribed sequence withSEQ ID No: 1048SEQ ID No: 1049strong similarity to protein sp: o00451(h. sapiens) nrtr_human neurturinreceptor alpha precursor (ntnr-alpha)(nrtnr-alpha) (tgf-beta relatedneurotrophic factor receptor 2) (gdnfreceptor beta) (gdnfr-beta) (ret ligand 2)(gfr-alpha 2)GALNACT-241843276chondroitin sulfate galnact-2SEQ ID No: 1050SEQ ID No: 1051F5419433155coagulation factor v (proaccelerin,SEQ ID No: 1052SEQ ID No: 1053labile factor)42043338homo sapiens transcribed sequence withSEQ ID No: 1054moderate similarity to proteinref: np_004491.1 (h. sapiens)heterogeneous nuclearribonucleoprotein c, isoform b; nuclearribonucleoprotein particle c1 protein;nuclear ribonucleoprotein particle c2protein [homo sapiens]RPL1542143442ribosomal protein 115SEQ ID No: 1055SEQ ID No: 1056RPS2842243493ribosomal protein s28SEQ ID No: 1057SEQ ID No: 1058SEQ ID No: 1059LDHA42343550lactate dehydrogenase aSEQ ID No: 1060SEQ ID No: 1061RAN42443638ran, member ras oncogene familySEQ ID No: 1062SEQ ID No: 1063SEQ ID No: 785PPP2CA42543760protein phosphatase 2 (formerly 2a),SEQ ID No: 1064SEQ ID No: 1065SEQ ID No: 1066catalytic subunit, alpha isoformCSNK2A142643941casein kinase 2, alpha 1 polypeptideSEQ ID No: 1067SEQ ID No: 1068SEQ ID No: 1069CCT342744152chaperonin containing tcp1, subunit 3SEQ ID No: 1070SEQ ID No: 1071SEQ ID No: 1072(gamma)LOC11528642845021hypothetical protein loc115286SEQ ID No: 1073SEQ ID No: 1074SEQ ID No: 1075SNCA42945086synuclein, alpha (non a4 component ofSEQ ID No: 1076SEQ ID No: 1077SEQ ID No: 1078amyloid precursor)MORF4L243045706mortality factor 4 like 2SEQ ID No: 1079SEQ ID No: 1080YWHAB43145831tyrosine 3-monooxygenase/tryptophanSEQ ID No: 1081SEQ ID No: 1082SEQ ID No: 10835-monooxygenase activation protein,beta polypeptidePCSK743245900proprotein convertase subtilisin/kexinSEQ ID No: 1084SEQ ID No: 1085type 7COX7A2L43346147cytochrome c oxidase subunit viiaSEQ ID No: 1086SEQ ID No: 1087SEQ ID No: 117polypeptide 2 likeDTNA43446518dystrobrevin, alphaSEQ ID No: 1088SEQ ID No: 1089SEQ ID No: 1090PPP1R743546888protein phosphatase 1, regulatorySEQ ID No: 1091SEQ ID No: 1092SEQ ID No: 1093subunit 7KCNMB1436470122potassium large conductance calcium-SEQ ID No: 1094SEQ ID No: 1095SEQ ID No: 1096activated channel, subfamily m, betamember 1MTCP1437470175mature t-cell proliferation 1SEQ ID No: 1097SEQ ID No: 1098SEQ ID No: 1099CNTNAP1438470279contactin associated protein 1SEQ ID No: 1100SEQ ID No: 1101LOC90139439470819tetraspanin similiar to uroplakin 1SEQ ID No: 1102SEQ ID No: 1103MRE11A440471256mre11 meiotic recombination 11SEQ ID No: 1104SEQ ID No: 1105SEQ ID No: 1106homolog a (s. cerevisiae)ICAM2441471918intercellular adhesion molecule 2SEQ ID No: 1107SEQ ID No: 1108BZRP442472021benzodiazapine receptor (peripheral)SEQ ID No: 1109SEQ ID No: 1110SEQ ID No: 111144347986SEQ ID No: 1112ITGB3444484874integrin, beta 3 (platelet glycoproteinSEQ ID No: 1113SEQ ID No: 1114iiia, antigen cd61)445485742similar to hypothetical proteinSEQ ID No: 1115SEQ ID No: 1116bc015353CABC1446486151chaperone, abc1 activity of bc1SEQ ID No: 1117SEQ ID No: 1118SEQ ID No: 1119complex like (s. pombe)RY1447486400putative nucleic acid binding protein ry-1SEQ ID No: 1120SEQ ID No: 1121SEQ ID No: 1122CDH13448486510cadherin 13, h-cadherin (heart)SEQ ID No: 1123SEQ ID No: 1124SEQ ID No: 1125SRP19449486702signal recognition particle 19 kdaSEQ ID No: 1126SEQ ID No: 1127SEQ ID No: 1128MIF450488144macrophage migration inhibitory factorSEQ ID No: 1129SEQ ID No: 1130(glycosylation-inhibiting factor)LTBP1451488316latent transforming growth factor betaSEQ ID No: 1131SEQ ID No: 1132SEQ ID No: 1133binding protein 1ZNF354A452488412zinc finger protein 354aSEQ ID No: 1134SEQ ID No: 1135SEQ ID No: 1136TLE2453488430transducin-like enhancer of split 2SEQ ID No: 1137SEQ ID No: 1138SEQ ID No: 1139(e(sp1) homolog, drosophila)MYH11454488526myosin, heavy polypeptide 11, smoothSEQ ID No: 1140SEQ ID No: 1141SEQ ID No: 1142musclePIP5K1A455488875phosphatidylinositol-4-phosphate 5-SEQ ID No: 1143SEQ ID No: 1144SEQ ID No: 1145kinase, type i, alphaMFAP3456488913microfibrillar-associated protein 3SEQ ID No: 1146SEQ ID No: 1147SEQ ID No: 1148GTF2H4457489497general transcription factor iih,SEQ ID No: 1149SEQ ID No: 1150SEQ ID No: 1151polypeptide 4, 52 kdaLRPPRC458489772leucine-rich ppr-motif containingSEQ ID No: 1152SEQ ID No: 1153SEQ ID No: 1154KIAA0232459489950kiaa0232 gene productSEQ ID No: 1155SEQ ID No: 1156GTF2F1460489961general transcription factor iif,SEQ ID No: 1157SEQ ID No: 1158SEQ ID No: 1159polypeptide 1, 74 kdaPSMD3461490174proteasome (prosome, macropain) 26sSEQ ID No: 1160SEQ ID No: 1161SEQ ID No: 1162subunit, non-atpase, 3DF462491284d component of complement (adipsin)SEQ ID No: 1163SEQ ID No: 1164PRNP46349691prion protein (p27-30) (creutzfeld-jakobSEQ ID No: 1165SEQ ID No: 1166SEQ ID No: 1167disease, gerstmann-strausler-scheinkersyndrome, fatal familial insomnia)464501939homo sapiens transcribed sequence withSEQ ID No: 1168SEQ ID No: 1169strong similarity to proteinref: np_057457.1 (h. sapiens) wwdomain-containing oxidoreductase,isoform 1; ww domain-containingprotein wwox; fragile site fra16doxidoreductase; fragile 16d oxidoreductase [homo sapiens]CCL11465502658chemokine (c—c motif) ligand 11SEQ ID No: 1170SEQ ID No: 1171SEQ ID No: 1172ARHA466503820ras homolog gene family, member aSEQ ID No: 1173SEQ ID No: 1174SEQ ID No: 1175ETFB467504184electron-transfer-flavoprotein, betaSEQ ID No: 1176SEQ ID No: 1177polypeptideZNF3468504811zinc finger protein 3 (a8-51)SEQ ID No: 1178SEQ ID No: 1179PYGL469505573phosphorylase, glycogen; liver (hersSEQ ID No: 1180SEQ ID No: 1181disease, glycogen storage disease typevi)PRKCB147050561protein kinase c, beta 1SEQ ID No: 1182SEQ ID No: 1183SEQ ID No: 1184FNBP3471509515formin binding protein 3SEQ ID No: 1185SEQ ID No: 1186SEQ ID No: 1187GNG12472509584guanine nucleotide binding protein (gSEQ ID No: 1188SEQ ID No: 1189protein), gamma 12TAF12473509588taf12 rna polymerase ii, tata boxSEQ ID No: 1190SEQ ID No: 1191SEQ ID No: 1192binding protein (tbp)-associated factor,20 kdaRPL27A474509719ribosomal protein l27aSEQ ID No: 1193SEQ ID No: 1194SEQ ID No: 1195PHB475509735prohibitinSEQ ID No: 1196SEQ ID No: 1197SEQ ID No: 1198SFRS9476509751splicing factor, arginine/serine-rich 9SEQ ID No: 1199SEQ ID No: 1200NONO477509887non-pou domain containing, octamer-SEQ ID No: 1201SEQ ID No: 1202SEQ ID No: 1203bindingCDH17478510130cadherin 17, li cadherin (liver-intestine)SEQ ID No: 1204SEQ ID No: 1205SEQ ID No: 1206CCT5479510161chaperonin containing tcp1, subunit 5SEQ ID No: 1207SEQ ID No: 1208(epsilon)RRM2480510231ribonucleotide reductase m2SEQ ID No: 1209SEQ ID No: 1210SEQ ID No: 1211polypeptideENO1481510235enolase 1, (alpha)SEQ ID No: 1212SEQ ID No: 1213SEQ ID No: 1214DKFZP564B1023482510354hypothetical protein dkfzp564b1023SEQ ID No: 1215SEQ ID No: 1216SEQ ID No: 1217PPEF148351064protein phosphatase, ef hand calcium-SEQ ID No: 1218SEQ ID No: 1219SEQ ID No: 1220binding domain 1CKB484510977creatine kinase, brainSEQ ID No: 1221SEQ ID No: 1222SEQ ID No: 1223TM4SF1485511778transmembrane 4 superfamily member 1SEQ ID No: 1224SEQ ID No: 1225SEQ ID No: 1226UBE2D3486512000ubiquitin-conjugating enzyme e2d 3SEQ ID No: 1227SEQ ID No: 1228SEQ ID No: 1229(ubc4/5 homolog, yeast)MRG2487512333likely ortholog of mouse myeloidSEQ ID No: 1230ecotropic viral integration site-relatedgene 2AK5488512824adenylate kinase 5SEQ ID No: 1231SEQ ID No: 1232489512924SEQ ID No: 1233SEQ ID No: 1234490513189SEQ ID No: 1235GADD45A49152065growth arrest and dna-damage-SEQ ID No: 1236SEQ ID No: 1237inducible, alphaGRIA149252228glutamate receptor, ionotropic, ampa 1SEQ ID No: 1238SEQ ID No: 1239SEQ ID No: 1240IDH1493525983isocitrate dehydrogenase 1 (nadp+),SEQ ID No: 1241SEQ ID No: 1242SEQ ID No: 1243soluble494526038SEQ ID No: 1244SEQ ID No: 1245PTK249552982ptk2 protein tyrosine kinase 2SEQ ID No: 1246SEQ ID No: 1247SEQ ID No: 1248CBR3496529844carbonyl reductase 3SEQ ID No: 1249SEQ ID No: 1250SEQ ID No: 1251COX7A2497529882cytochrome c oxidase subunit viiaSEQ ID No: 1252SEQ ID No: 1253SEQ ID No: 739polypeptide 2 (liver)498530034SEQ ID No: 1254SEQ ID No: 1255499530037SEQ ID No: 1256SEQ ID No: 1257UBA52500530069ubiquitin a-52 residue ribosomal proteinSEQ ID No: 1258SEQ ID No: 1259SEQ ID No: 393fusion product 1COX7C501530338cytochrome c oxidase subunit viicSEQ ID No: 1260SEQ ID No: 1261SEQ ID No: 1262RPL5502530368ribosomal protein 15SEQ ID No: 1263SEQ ID No: 1264SEQ ID No: 1265FLIPT150353061fly-like putative organic ion transporter 1SEQ ID No: 1266SEQ ID No: 1267SEQ ID No: 1268504530744homo sapiens cyclophilin mrna,SEQ ID No: 1269SEQ ID No: 1270complete cdsRPL13A505530773ribosomal protein l13aSEQ ID No: 1271SEQ ID No: 1272SEQ ID No: 1273506531366SEQ ID No: 1274SEQ ID No: 1275EPS15R507531496epidermal growth factor receptorSEQ ID No: 1276SEQ ID No: 1277SEQ ID No: 1278substrate eps15rSTMN150853227stathmin 1/oncoprotein 18SEQ ID No: 1279SEQ ID No: 1280SEQ ID No: 1281MDH150953316malate dehydrogenase 1, nad (soluble)SEQ ID No: 1282SEQ ID No: 128351053331loc350717SEQ ID No: 1284HCNGP511544680transcriptional regulator proteinSEQ ID No: 1285SEQ ID No: 1286SEQ ID No: 1287512544767SEQ ID No: 1288SEQ ID No: 1289513544806SEQ ID No: 1290SEQ ID No: 1291TMSB4X514544841thymosin, beta 4, x chromosomeSEQ ID No: 1292SEQ ID No: 1293SEQ ID No: 1294515544875SEQ ID No: 1295SEQ ID No: 1296RPL5516544885ribosomal protein l5SEQ ID No: 1297SEQ ID No: 1298SEQ ID No: 1265517545000SEQ ID No: 1299SEQ ID No: 1300518545236SEQ ID No: 1301SEQ ID No: 1302LOC92906519545423hypothetical protein bc008217SEQ ID No: 1303SEQ ID No: 1304SEQ ID No: 30RPL29520545580ribosomal protein l29SEQ ID No: 1305SEQ ID No: 1306SEQ ID No: 1307TM9SF2521546351transmembrane 9 superfamily member 2SEQ ID No: 1308SEQ ID No: 1309GNB2L1522546439guanine nucleotide binding protein (gSEQ ID No: 1310SEQ ID No: 1311SEQ ID No: 1312protein), beta polypeptide 2-like 1WASF3523546460was protein family, member 3SEQ ID No: 1313SEQ ID No: 1314SEQ ID No: 1315RAB7524546545rab7, member ras oncogene familySEQ ID No: 1316SEQ ID No: 1317SEQ ID No: 1318RPS8525546664ribosomal protein s8SEQ ID No: 1319SEQ ID No: 1320SEQ ID No: 1321526546935SEQ ID No: 1322SEQ ID No: 1323527547224SEQ ID No: 1324SEQ ID No: 1325528547334SEQ ID No: 1326SEQ ID No: 1327WASL529547443wiskott-aldrich syndrome-likeSEQ ID No: 1328SEQ ID No: 1329RPL10A530548702ribosomal protein l10aSEQ ID No: 1330SEQ ID No: 1331SEQ ID No: 1332BOP1531548777block of proliferation 1SEQ ID No: 1333SEQ ID No: 1334SEQ ID No: 1335G22P1532549065thyroid autoantigen 70 kda (ku antigen)SEQ ID No: 1336SEQ ID No: 1337SEQ ID No: 1338ARSD533549139arylsulfatase dSEQ ID No: 1339SEQ ID No: 1340SEQ ID No: 1341RPS8534549152ribosomal protein s8SEQ ID No: 1342SEQ ID No: 1343SEQ ID No: 1321EIF3S2535549173eukaryotic translation initiation factor 3,SEQ ID No: 1344SEQ ID No: 1345SEQ ID No: 1346subunit 2 beta, 36 kdaYWHAQ536549178tyrosine 3-monooxygenase/tryptophanSEQ ID No: 1347SEQ ID No: 13485-monooxygenase activation protein,theta polypeptideRPL5537549200ribosomal protein 15SEQ ID No: 1349SEQ ID No: 1350SEQ ID No: 1265NPM1538549212nucleophosmin (nucleolarSEQ ID No: 1351SEQ ID No: 1352phosphoprotein b23, numatrin)COX5B539549361cytochrome c oxidase subunit vbSEQ ID No: 1353SEQ ID No: 478PPP2CA540550315protein phosphatase 2 (formerly 2a),SEQ ID No: 1354SEQ ID No: 1355SEQ ID No: 1066catalytic subunit, alpha isoformMYH1541561922myosin, heavy polypeptide 1, skeletalSEQ ID No: 1356SEQ ID No: 1357SEQ ID No: 1358muscle, adultACTA1542561948actin, alpha 1, skeletal muscleSEQ ID No: 1359SEQ ID No: 1360SEQ ID No: 1361TTN543562021titinSEQ ID No: 1362SEQ ID No: 1363SEQ ID No: 1364XRCC5544563112x-ray repair complementing defectiveSEQ ID No: 1365SEQ ID No: 1366repair in chinese hamster cells 5(double-strand-break rejoining; kuautoantigen, 80 kda)CCNB1545563130cyclin b1SEQ ID No: 1367SEQ ID No: 1368SEQ ID No: 1369HSPD1546563819heat shock 60 kda protein 1 (chaperonin)SEQ ID No: 1370SEQ ID No: 1371SEQ ID No: 1372HMGB1547564501high-mobility group box 1SEQ ID No: 1373SEQ ID No: 1374SP3548564535sp3 transcription factorSEQ ID No: 1375SEQ ID No: 1376GSTT2549564547glutathione s-transferase theta 2SEQ ID No: 1377SEQ ID No: 1378SEQ ID No: 1379XRCC5550587547x-ray repair complementing defectiveSEQ ID No: 1380SEQ ID No: 1381SEQ ID No: 1366repair in chinese hamster cells 5(double-strand-break rejoining; kuautoantigen, 80 kda)CRNKL1551590592crn, crooked neck-like 1 (drosophila)SEQ ID No: 1382SEQ ID No: 1383SEQ ID No: 1384UBE2C552592041ubiquitin-conjugating enzyme e2cSEQ ID No: 1385SEQ ID No: 1386PPP4R2553592521protein phosphatase 4, regulatorySEQ ID No: 1387SEQ ID No: 1388subunit 2PDK4554594120pyruvate dehydrogenase kinase,SEQ ID No: 1389SEQ ID No: 1390isoenzyme 4555594540similar to metallothionein-ie (mt-1e)SEQ ID No: 1391BPHL556595600biphenyl hydrolase-like (serineSEQ ID No: 1392SEQ ID No: 1393SEQ ID No: 1394hydrolase; breast epithelial mucin-associated antigen)ZNF20455760204zinc finger protein 204SEQ ID No: 1395SEQ ID No: 1396HOXA1558611075homeo box a1SEQ ID No: 1397SEQ ID No: 1398SEQ ID No: 1399C22ORF19559611123chromosome 22 open reading frame 19SEQ ID No: 1400SEQ ID No: 1401SEQ ID No: 1402MYF6560611255myogenic factor 6 (herculin)SEQ ID No: 1403SEQ ID No: 1404SEQ ID No: 1405KIAA1181561611623kiaa1181 proteinSEQ ID No: 1406SEQ ID No: 1407AMPD1562611660adenosine monophosphate deaminase 1SEQ ID No: 1408SEQ ID No: 1409(isoform m)TNNT3563611783troponin t3, skeletal, fastSEQ ID No: 1410SEQ ID No: 1411NEDD5564611946neural precursor cell expressed,SEQ ID No: 1412SEQ ID No: 1413SEQ ID No: 1414developmentally down-regulated 5HSPA9B565612365heat shock 70 kda protein 9b (mortalin-SEQ ID No: 1415SEQ ID No: 1416SEQ ID No: 6642)56662429SEQ ID No: 1417SEQ ID No: 1418567624513homo sapiens transcribed sequence withSEQ ID No: 1419SEQ ID No: 1420strong similarity to protein pir: s29331(h. sapiens) s29331 glutamatedehydrogenase - humanGNB2L1568625541guanine nucleotide binding protein (gSEQ ID No: 1421SEQ ID No: 1422SEQ ID No: 1312protein), beta polypeptide 2-like 1GNB2L1569625574guanine nucleotide binding protein (gSEQ ID No: 1423SEQ ID No: 1424SEQ ID No: 1312protein), beta polypeptide 2-like 1MYL3570628602myosin, light polypeptide 3, alkali;SEQ ID No: 1425SEQ ID No: 1426SEQ ID No: 1427ventricular, skeletal, slowCOX6B571632026cytochrome c oxidase subunit vibSEQ ID No: 1428SEQ ID No: 1429SEQ ID No: 1430DNAJD1572664980dnaj (hsp40) homolog, subfamily d,SEQ ID No: 1431SEQ ID No: 1432member 1AKR1A1573665117aldo-keto reductase family 1, memberSEQ ID No: 1433SEQ ID No: 1434SEQ ID No: 1435a1 (aldehyde reductase)MAP2K7574665682mitogen-activated protein kinase kinase 7SEQ ID No: 1436SEQ ID No: 1437SEQ ID No: 1438SLC7A6575665778solute carrier family 7 (cationic aminoSEQ ID No: 1439SEQ ID No: 1440SEQ ID No: 1441acid transporter, y+ system), member 6ANXA6576665818annexin a6SEQ ID No: 1442SEQ ID No: 1443SEQ ID No: 1444HIST1H4C577667303histone 1, h4cSEQ ID No: 1445SEQ ID No: 1446SEQ ID No: 144757866800SEQ ID No: 1448CPSF557966820cleavage and polyadenylation specificSEQ ID No: 1449SEQ ID No: 1450factor 5, 25 kda58066832SEQ ID No: 145158166836SEQ ID No: 1452GTF2E1582668494general transcription factor iie,SEQ ID No: 1453SEQ ID No: 1454SEQ ID No: 1455polypeptide 1, alpha 56 kda58366895homo sapiens transcribed sequencesSEQ ID No: 1456RPS1458467721ribosomal protein s14SEQ ID No: 1457SEQ ID No: 1458SEQ ID No: 1459KRT2358567740keratin 23 (histone deacetylaseSEQ ID No: 1460SEQ ID No: 1461SEQ ID No: 1462inducible)58667776SEQ ID No: 146358768140SEQ ID No: 1464SEQ ID No: 146558868141SEQ ID No: 1466FLJ1091658968176hypothetical protein flj10916SEQ ID No: 1467SEQ ID No: 1468SEQ ID No: 1469ERCC4590682268excision repair cross-complementingSEQ ID No: 1470SEQ ID No: 1471SEQ ID No: 1472rodent repair deficiency,complementation group 459168227SEQ ID No: 1473SEQ ID No: 1474COL5A159268276collagen, type v, alpha 1SEQ ID No: 1475SEQ ID No: 1476MYOM159368351myomesin 1 (skelemin) 185 kdaSEQ ID No: 1477SEQ ID No: 1478NEK659469584nima (never in mitosis gene a)-relatedSEQ ID No: 1479SEQ ID No: 1480kinase 6RPS2359570825ribosomal protein s23SEQ ID No: 1481SEQ ID No: 1482SEQ ID No: 1483RPL559671096ribosomal protein 15SEQ ID No: 1484SEQ ID No: 1485SEQ ID No: 1265HSF1597712675heat shock transcription factor 1SEQ ID No: 1486SEQ ID No: 1487SEQ ID No: 1488FRAP1598713218fk506 binding protein 12-rapamycinSEQ ID No: 1489SEQ ID No: 1490SEQ ID No: 1491associated protein 1MGC27165599713459hypothetical protein mgc27165SEQ ID No: 1492SEQ ID No: 1493RPS2760072056ribosomal protein s27SEQ ID No: 1494SEQ ID No: 1495SEQ ID No: 1496(metallopanstimulin 1)RELA601723731v-rel reticuloendotheliosis viralSEQ ID No: 1497SEQ ID No: 1498oncogene homolog a, nuclear factor ofkappa light polypeptide gene enhancerin b-cells 3, p65 (avian)RYR360272497ryanodine receptor 3SEQ ID No: 1499SEQ ID No: 1500COL6A1603726342collagen, type vi, alpha 1SEQ ID No: 1501SEQ ID No: 1502SEQ ID No: 825CNN1604726779calponin 1, basic, smooth muscleSEQ ID No: 1503SEQ ID No: 1504ITIH160572694inter-alpha (globulin) inhibitor, h1SEQ ID No: 1505SEQ ID No: 1506polypeptidePDE1A606727792phosphodiesterase 1a, calmodulin-SEQ ID No: 1507SEQ ID No: 1508SEQ ID No: 1509dependentSSR260772789signal sequence receptor, betaSEQ ID No: 1510SEQ ID No: 1511SEQ ID No: 1512(translocon-associated protein beta)NFYA608730787nuclear transcription factor y, alphaSEQ ID No: 1513SEQ ID No: 1514SEQ ID No: 1515RPS760973590ribosomal protein s7SEQ ID No: 1516SEQ ID No: 1517SEQ ID No: 151861074834SEQ ID No: 1519SVIL611754018supervillinSEQ ID No: 1520SEQ ID No: 1521THPO612754034thrombopoietin (myeloproliferativeSEQ ID No: 1522SEQ ID No: 1523SEQ ID No: 1524leukemia virus oncogene ligand,megakaryocyte growth anddevelopment factor)C1ORF29613754479chromosome 1 open reading frame 29SEQ ID No: 1525SEQ ID No: 1526SEQ ID No: 1527IFITM1614755599interferon induced transmembraneSEQ ID No: 1528SEQ ID No: 1529SEQ ID No: 1530protein 1 (9-27)RARB615755663retinoic acid receptor, betaSEQ ID No: 1531SEQ ID No: 1532SEQ ID No: 398BMP6616768168bone morphogenetic protein 6SEQ ID No: 1533SEQ ID No: 1534SEQ ID No: 1535RPS6KB1617773319ribosomal protein s6 kinase, 70 kda,SEQ ID No: 1536SEQ ID No: 1537SEQ ID No: 1538polypeptide 1R30953_1618782601hypothetical protein r30953_1SEQ ID No: 1539SEQ ID No: 1540SEQ ID No: 1541RNF13619785886ring finger protein 13SEQ ID No: 1542SEQ ID No: 1543SEQ ID No: 1544CGI-128620786662cgi-128 proteinSEQ ID No: 1545SEQ ID No: 1546SEQ ID No: 154762178879similar to complement component 3SEQ ID No: 1548CDH162279598cadherin 1, type 1, e-cadherinSEQ ID No: 1549SEQ ID No: 1550SEQ ID No: 1551(epithelial)FHL3623796475four and a half lim domains 3SEQ ID No: 1552SEQ ID No: 1553SEQ ID No: 155462479829homo sapiens transcribed sequencesSEQ ID No: 1555VAV162580384vav 1 oncogeneSEQ ID No: 1556SEQ ID No: 1557SEQ ID No: 1558PPP1R14A626809611protein phosphatase 1, regulatorySEQ ID No: 1559SEQ ID No: 1560(inhibitor) subunit 14aETV4627809959ets variant gene 4 (e1a enhancerSEQ ID No: 1561SEQ ID No: 1562SEQ ID No: 1563binding protein, e1af)S100A2628810813s100 calcium binding protein a2SEQ ID No: 1564SEQ ID No: 1565SEQ ID No: 1566ITGA2629811740integrin, alpha 2 (cd49b, alpha 2SEQ ID No: 1567SEQ ID No: 1568SEQ ID No: 1569subunit of vla-2 receptor)YWHAZ630811939tyrosine 3-monooxygenase/tryptophanSEQ ID No: 1570SEQ ID No: 1571SEQ ID No: 15725-monooxygenase activation protein,zeta polypeptidePCDH7631813384bh-protocadherin (brain-heart)SEQ ID No: 1573SEQ ID No: 1574632813755similar to zinc finger protein 7 (zincSEQ ID No: 1575SEQ ID No: 1576finger protein kox4) (zinc finger proteinhf. 16)GJB2633823859gap junction protein, beta 2, 26 kdaSEQ ID No: 1577SEQ ID No: 1578SEQ ID No: 1579(connexin 26)VWF634840486von willebrand factorSEQ ID No: 1580SEQ ID No: 1581SEQ ID No: 1582NME1635845363non-metastatic cells 1, protein (nm23a)SEQ ID No: 1583SEQ ID No: 288expressed inEIF3S6636856961eukaryotic translation initiation factor 3,SEQ ID No: 1584SEQ ID No: 1585subunit 6 48 kda63786078SEQ ID No: 1586638869440SEQ ID No: 1587RPL30639878681ribosomal protein 130SEQ ID No: 1588SEQ ID No: 1589B2M640878798beta-2-microglobulinSEQ ID No: 1590SEQ ID No: 813HMGB2641884365high-mobility group box 2SEQ ID No: 1591SEQ ID No: 552LAMR1642884644laminin receptor 1 (ribosomal proteinSEQ ID No: 1592SEQ ID No: 987sa, 67 kda)PRAME643897956preferentially expressed antigen inSEQ ID No: 1593SEQ ID No: 1594melanomaNME2644951066non-metastatic cells 2, protein (nm23b)SEQ ID No: 1595SEQ ID No: 1596expressed in

Table 1 above identifies a library of polynucleotide sequences of SEQ ID NO. 1 to SEQ ID NO. 1556 and arranges them into sets. Table 1 indicates, wherever available, the name of the gene with its gene symbol, its Image Clone and, for each gene, the relevant SEQ ID NOS defining the set. The “3′” and “5′” columns represent ESTs and the “Ref.” column represent mRNAs of the named gene or Image Clone.

Thus, the nucleotide sequences of the present invention can be defined by the differents sets, but can also be defined by the name of the gene or fragments thereof as recited in Table 1. Each polynucleotide sequence in Table 1 can therefore be considered as a marker of the corresponding gene. Each marker corresponds to a gene in the human genome; i.e., such marker is identifiable as all or a portion of a gene. The term “marker”, as used herein, is thus meant to refer to the complete gene nucleotide sequence or an EST nucleotide sequence derived from that gene (or a subsequence or complement thereof), the expression or level of which changes with certain conditions, disorders or diseases. Where the expression of the gene correlates with a certain condition, disorder or disease, the gene is a marker for that condition, disorder or disease. Any RNA transcribed from a marker gene (e.g., mRNAs), any cDNA or cRNA produced therefrom, and any nucleic acid derived therefrom, such as synthetic nucleic acid having a sequence derived from the gene corresponding to the marker gene, are also encompassed by the present invention.

Each mRNA sequence in the Ref. column represents one of the various mRNA splice forms of the gene that are known in the art; e.g., splice forms described in publicly available genomic databases. A skilled artisan is able to select, by routine experimentation, one or more appropriate splice form(s) by, e.g., determining those splice forms having a sequence that matches the sequence of the corresponding Image Clone with a predetermined level of homology.

A disease, disorder, or condition “associated with” an aberrant expression of a nucleic acid refers to a disease, disorder, or condition in a subject which is caused by, contributed to by, or causative of an aberrant level of expression of a nucleic acid.

By “nucleic acids,” as used herein, is meant polynucleotides, e.g., isolated, such as isolated deoxyribonucleic acid (DNA), and, where appropriate, isolated ribonucleic acid (RNA). The term is also understood to include, as equivalents, analogs of RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides. ESTs, chromosomes or genomic DNA, cDNAs, mRNAs, and rRNAs are representative examples of molecules that can be referred to as nucleic acids. DNA can be obtained from said nucleic acids sample and RNA can be obtained by transcription of said DNA. In addition, mRNA can be isolated from said nucleic acids sample and cDNA can be obtained by reverse transcription of said mRNA.

The term “subsequence”, as used herein, is meant to refer to any sequence corresponding to a part of said polynucleotide sequence, which would also be suitable to perform the method of analysis according to the invention. A person skilled in the art can choose the position and length of a subsequence of the invention by applying routine experiments. A subsequence can have at least about 80% homology with said polynucleotide sequence; e.g., at least about 85%, at least about 90%, at least about 95%, or at least about 99% homology.

The term “pool”, as used herein, is meant to refer to a group of nucleic acid sequences comprising one or more sequences, for example about: 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500,1600, 1700, 1800, 1900, or 2000 sequences.

The number of sets may vary in the range of from 1 to the maximum number of sets described therein, e.g., 646 sets, for example about: 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500, 550, or 600 sets.

The over or under expression (or respectively “up regulation” and “down regulation,” which may be used interchangeably with over or under expression, respectively) can be determined by any known method within the skill in the art, such as disclosed in PCT patent application WO 02/103320, the entire disclosure of which is herein incorporated by reference. Such methods can comprise the detection of difference in the expression of the polynucleotide sequences according to the present invention in relation to at least one control. Said control can comprise, for example, polynucleotide sequence(s) from sample of the same patient or from a pool of patients exhibiting histopathologic features of colorectal disease, or selected from among reference sequence(s) which are already known to be over or under expressed. The expression level of said control can be an average or an absolute value of the expression of reference polynucleotide sequences. These values can be processed (e.g., statistically) in order to accentuate the difference relative to the expression of the polynucleotide sequences of the invention.

The analysis of the over or under expression of polynucleotide sequences can be carried out on sample, such as biological material derived from any mammalian cells, including cell lines, xenografts, and human tissues, preferably from colon tissue. The method according to the invention can be performed on sample from a human subject or an animal (for example for veterinary application or preclinical trial).

By “over or underexpression” of a polynucleotide sequence, as used herein, is meant that overexpression of certain sequences is detected simultaneously with the underexpression of other sequences. “Simultaneously” means concurrent with or within a biologic or functionally relevant period of time during which the over expression of a sequence can be followed by the under expression of another sequence, or conversely, e.g., because both over and under expression are directly or indirectly correlated.

In one embodiment, the method according to the present invention is therefore directed to the analysis of differential gene expression associated with colon tumors wherein the pool of polynucleotide sequences corresponds to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

1; 4; 9; 10; 11; 13; 15; 16; 17; 18; 21; 27; 28; 30; 31; 34; 37; 39; 41; 43; 45; 46; 52; 53; 58; 59; 60; 65; 68; 69; 70; 75; 76; 78; 79; 80; 84; 85; 87; 88; 90; 95; 96; 98; 99; 101; 105; 108; 110; 111; 113; 114; 116; 119; 120; 122; 124; 125; 126; 127; 130; 131; 138; 139; 140; 141; 143; 150; 152; 153; 155; 159; 164; 171; 175; 176; 178; 181; 182; 184; 185; 189; 192; 196; 197; 198; 203; 205; 207; 208; 210; 213; 214; 215; 216; 218; 221; 223; 225; 227; 231; 235; 241; 243; 251; 256; 259; 261; 262; 263; 264; 266; 267; 268; 270; 279; 281; 286; 287; 288; 291; 298; 299; 301; 307; 310; 312; 313; 317; 319; 329; 331; 332; 337; 338; 339; 340; 341; 342; 344; 346; 352; 354; 357; 360; 361; 366; 368; 369; 377; 379; 381; 384; 385; 386; 390; 392; 394; 395; 397; 398; 400; 401; 405; 406; 409; 410; 413; 423; 427; 434; 436; 437; 438; 440; 442; 443; 444; 445; 448; 454; 459; 463; 464; 467; 469; 470; 488; 492; 495; 500; 503; 507; 508; 516; 518; 520; 522; 524; 538; 543; 547; 549; 552; 555; 557; 561; 567; 568; 569; 573; 574; 583; 586; 588; 592; 596; 597; 598; 599; 600; 601; 604; 609; 610; 611; 614; 616; 617; 621; 626; 627; 629; 630; 631; 632; 634; 635; 636; 638; 641; 642; and 644.

Said analysis can comprise at least one of the following steps:

- The detection of the overexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequences sets consisting of sets:

1; 9; 10; 16; 18; 27; 28; 30; 39; 41; 43; 45; 53; 58; 60; 65; 69; 75; 76; 113; 116; 120; 122; 126; 127; 130; 131; 138; 139; 140; 141; 143; 150; 152; 153; 159; 181; 182; 184; 189; 192; 197; 198; 210; 213; 214; 216; 218; 225; 227; 243; 259; 261; 264; 266; 267; 268; 281; 286; 287; 288; 291; 299; 307; 312; 313; 317; 319; 332; 337; 338; 339; 340; 341; 342; 344; 354; 357; 360; 361; 368; 381; 384; 385; 392; 394; 397; 398; 405; 423; 427; 442; 444; 464; 467; 469; 488; 495; 500; 507; 508; 516; 520; 522; 524; 538; 543; 547; 549; 552; 561; 567; 568; 569; 573; 586; 588; 592; 596; 600; 609; 614; 627; 629; 630; 635; 636; 641; 642; and 644.

- The detection of the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

4; 11; 13; 15; 17; 21; 31; 34; 37; 46; 52; 59; 68; 70; 78; 79; 80; 84; 85; 87; 88; 90; 95; 96; 98; 99; 101; 105; 108; 110; 111; 114; 119; 124; 125; 155; 164; 171; 175; 176; 178; 185; 196; 203; 205; 207; 208; 215; 221; 223; 231; 235; 241; 251; 256; 262; 263; 270; 279; 298; 301; 310; 329; 331; 346; 352; 366; 369; 377; 379; 386; 390; 395; 400; 401; 406; 409; 410; 413; 434; 436; 437; 438; 440; 443; 445; 448; 454; 459; 463; 470; 492; 503; 518; 555; 557; 574; 583; 597; 598; 599; 601; 604; 610; 611; 616; 617; 621; 626; 631; 632; 634; and 638.

In a preferred embodiment, the sets for analyzing differential gene expression associated with colon tumors can, for example, consist of those mentioned in Table 2:

TABLE 2CloneidentifierGeneReferenceTitle of clusterSets(Image)Cluster (Unigene)Symbolsequences(Gene name)SEQ ID Numbers11012666ughs.82422:175capgnm_001747capping protein (actin filament),SEQ ID NO: 1597gelsolin-like41046837ughs.235935:175novnm_002514nephroblastoma overexpressed geneSEQ ID NO: 159815110486ughs.404336:175loc92906nm_138394hypothetical protein bc008217SEQ ID NO: 159921117240ughs.180398:175lppnm_005578lim domain containing preferredSEQ ID NO: 1600translocation partner in lipoma27119530ughs.17287:175kcnj15nm_002243,potassium inwardly-rectifyingSEQ ID NO: 1601nm_170736,channel, subfamily j, member 15SEQ ID NO: 1602nm_170737SEQ ID NO: 160358133883168139789ughs.79095:175eps15nm_001981epidermal growth factor receptorSEQ ID NO: 1604pathway substrate 15751456160ughs.531989:175azgp1nm_001185alpha-2-glycoprotein 1, zincSEQ ID NO: 16057914692295153461ughs.25511:175tgfb1i1nm_015927transforming growth factor beta 1SEQ ID NO: 1606induced transcript 198153854ughs.279604:175desnm_001927desminSEQ ID NO: 1607101154600ughs.80776:175plcd1nm_006225phospholipase c, delta 1SEQ ID NO: 16081141667886ughs.75486:175hsf4nm_001538heat shock transcription factor 4SEQ ID NO: 16091191731982ughs.271620:175plcg2nm_002661phospholipase c, gamma 2SEQ ID NO: 1610(phosphatidylinositol-specific)127186331ughs.32393:175darsnm_001349aspartyl-trna synthetaseSEQ ID NO: 16111311912132ughs.250822:175stk6nm_003600,serine/threonine kinase 6SEQ ID NO: 1612nm_198433,SEQ ID NO: 1613nm_198434,SEQ ID NO: 1614nm_198435,SEQ ID NO: 1615nm_198436,SEQ ID NO: 1616nm_198437SEQ ID NO: 1617140195702ughs.270920:175dap3nm_004632,death associated protein 3SEQ ID NO: 1618nm_033657SEQ ID NO: 16191552055272ughs.252938:175lrp2nm_004525low density lipoprotein-relatedSEQ ID NO: 1620protein 21762349125ughs.136713:175vpreb3nm_013378pre-b lymphocyte gene 3SEQ ID NO: 1621192241788ughs.300774:175fgbnm_005141fibrinogen, b beta polypeptideSEQ ID NO: 1622241272189ughs.260523:175nrasnm_002524neuroblastoma ras viral (v-ras)SEQ ID NO: 1623oncogene homolog243272502ughs.374334:175cct4nm_006430chaperonin containing tcp1, subunit 4SEQ ID NO: 1624(delta)259285780ughs.2936:175mmp13nm_002427matrix metalloproteinase 13SEQ ID NO: 1625(collagenase 3)263288874ughs.37014:175;ca7;nm_005182;carbonic anhydrase vii; zinc fingerSEQ ID NO: 1626ughs.48589:175znf228nm_013380protein 228SEQ ID NO: 162727030066ughs.89657:175ilknm_004517integrin-linked kinaseSEQ ID NO: 1628279306697ughs.82508:175thap11nm_020457thap domain containing 11SEQ ID NO: 1629286310860ughs.368481:175nudt5nm_014142nudix (nucleoside diphosphate linkedSEQ ID NO: 1630moiety x)-type motif 5298322452ughs.124411:175chganm_001275chromogranin a (parathyroidSEQ ID NO: 1631secretory protein 1)299322471ughs.1063:175snrpcnm_003093small nuclear ribonucleoproteinSEQ ID NO: 1632polypeptide c307323948ughs.2316:175sox9nm_000346sry (sex determining region y)-box 9SEQ ID NO: 1633(campomelic dysplasia, autosomalsex-reversal)310324369ughs.513557:175ctbsnm_004388chitobiase, di-n-acetyl-SEQ ID NO: 1634312324757ughs.370504:175rps15anm_001019ribosomal protein s15aSEQ ID NO: 1635313324930ughs.28491:175satnm_002970spermidine/spermine n1-SEQ ID NO: 1636acetyltransferase317327684ughs.148090:175cdh15nm_004933cadherin 15, m-cadherin (myotubule)SEQ ID NO: 1637329342054ughs.20136:175cxorf6nm_005491chromosome x open reading frame 6SEQ ID NO: 163834634888ughs.489521:175;reln;nm_005045,reelin; transcribed locusSEQ ID NO: 1639ughs.492257:175nm_173054;SEQ ID NO: 1640357358117ughs.2316:175sox9nm_000346sry (sex determining region y)-box 9(campomelic dysplasia, autosomalsex-reversal)360358683ughs.133892:175tpm1nm_000366tropomyosin 1 (alpha)SEQ ID NO: 1641361358943ughs.438837:175n2nnm_203458similar to notch2 proteinSEQ ID NO: 1642394383433ughs.356261:175similar to laminin receptor 139539593ughs.12409:175sstnm_001048somatostatinSEQ ID NO: 164339839972ughs.432317:175adam23nm_003812a disintegrin and metalloproteinaseSEQ ID NO: 1644domain 23405415389ughs.334612:175snrpenm_003094small nuclear ribonucleoproteinSEQ ID NO: 1645polypeptide e406416060ughs.440934:175arg1nm_000045arginase, liverSEQ ID NO: 1646413427858ughs.508411:175gpc6nm_005708glypican 6SEQ ID NO: 164742744152ughs.1708:175cct3nm_005998chaperonin containing tcp1, subunit 3SEQ ID NO: 1648(gamma)436470122ughs.93841:175kcnmb1nm_004137potassium large conductanceSEQ ID NO: 1649calcium-activated channel, subfamilym, beta member 1437470175ughs.3548:175mtcp1nm_014221mature t-cell proliferation 1SEQ ID NO: 1650438470279ughs.408730:175cntnap1nm_003632contactin associated protein 1SEQ ID NO: 165144347986ughs.149609:175itga5nm_002205integrin, alpha 5 (fibronectinSEQ ID NO: 1652receptor, alpha polypeptide)454488526ughs.78344:175myh11nm_002474,myosin, heavy polypeptide 11,SEQ ID NO: 1653nm_022844smooth muscleSEQ ID NO: 1654464501939ughs.21635:175;tubg1;nm_001070;tubulin, gamma 1; ww domainSEQ ID NO: 1655ughs.461453:175wwoxnm_016373,containing oxidoreductaseSEQ ID NO: 1656nm_018560,SEQ ID NO: 1657nm_130788,SEQ ID NO: 1658nm_130790,SEQ ID NO: 1659nm_130791,SEQ ID NO: 1660nm_130792,SEQ ID NO: 1661nm_130844SEQ ID NO: 1662507531496ughs.292072:175eps15l1nm_021235epidermal growth factor receptorSEQ ID NO: 1663pathway substrate 15-like 1522546439ughs.5662:175gnb2l1nm_006098guanine nucleotide binding protein (gSEQ ID NO: 1664protein), beta polypeptide 2-like 1547564501ughs.434102:175hmgb1nm_002128high-mobility group box 1SEQ ID NO: 1665552592041ughs.93002:175ube2cnm_007019,ubiquitin-conjugating enzyme e2cSEQ ID NO: 1666nm_181799,SEQ ID NO: 1667nm_181800,SEQ ID NO: 1668nm_181801,SEQ ID NO: 1669nm_181802,SEQ ID NO: 1670nm_181803SEQ ID NO: 1671555594540ughs.454253:175ptchnm_000264patched homolog (drosophila)SEQ ID NO: 1672568625541ughs.5662:175gnb2l1nm_006098guanine nucleotide binding protein (gprotein), beta polypeptide 2-like 1569625574ughs.5662:175gnb2l1nm_006098guanine nucleotide binding protein (gprotein), beta polypeptide 2-like 1614755599ughs.458414:175ifitm1nm_003641interferon induced transmembraneSEQ ID NO: 1673protein 1 (9-27)631813384ughs.443020:175pcdh7nm_002589,bh-protocadherin (brain-heart)SEQ ID NO: 1674nm_032456,SEQ ID NO: 1675nm_032457SEQ ID NO: 1676634840486ughs.440848:175vwfnm_000552von willebrand factorSEQ ID NO: 1677636856961ughs.405590:175eif3s6nm_001568eukaryotic translation initiationSEQ ID NO: 1678factor 3, subunit 6 48 kda641884365ughs.434953:175hmgb2nm_002129high-mobility group box 2SEQ ID NO: 1679644951066ughs.433416:175nme2nm_002512non-metastatic cells 2, proteinSEQ ID NO: 1680(nm23b) expressed in

In another embodiment, the method according to the present invention is directed to the analysis of differential gene expression associated with secondary metastatic events in patients with colorectal tumors, in particular visceral metastasis or lymph node metastasis. In the visceral metastasis embodiment, said analysis comprises the detection of the overexpression or the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

2; 3; 10; 22; 24; 25; 30; 32; 33; 35; 36; 39; 40; 41; 42; 47; 50; 54; 57; 67; 72; 86; 97; 102; 103; 104; 107; 117; 118; 120; 128; 130; 132; 133; 134; 137; 144; 145; 146; 147; 149; 153; 156; 158; 162; 163; 165; 169; 170; 173; 174; 179; 180; 188; 191; 193; 194; 195; 199; 200; 201; 202; 204; 206; 209; 210; 211; 212; 213; 214; 216; 217; 219; 222; 234; 238; 246; 248; 249; 250; 255; 271; 272; 273; 276; 277; 278; 282; 283; 284; 291; 292; 293; 294; 295; 296; 303; 304; 305; 306; 308; 312; 314; 318; 323; 324; 325; 326; 330; 336; 337; 338; 339; 340; 341; 342; 343; 344; 347; 349; 350; 351; 353; 356; 359; 360; 361; 362; 363; 364; 371; 372; 374; 378; 380; 381; 382; 383; 384; 387; 388; 393; 396; 397; 399; 402; 403; 408; 414; 415; 417; 418; 419; 420; 421; 422; 426; 428; 430; 432; 433; 441; 446; 449; 457; 458; 460; 465; 471; 472; 473; 475; 476; 478; 480; 481; 482; 484; 485; 486; 490; 493; 494; 497; 501; 502; 504; 505; 509; 510; 514; 516; 520; 525; 526; 527; 528; 529; 530; 537; 538; 539; 541; 545; 546; 550; 558; 559; 560; 561; 562; 564; 565; 566; 571; 576; 577; 578; 580; 581; 584; 585; 586; 590; 591; 593; 594; 595; 596; 602; 607; 609; 612; 613; 615; 623; 624; 625; 633; 635; 639; 640; 643; and 644.

The analysis can comprise at least one of the following steps:

- The detection of the overexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complement thereof selected from each of predefined polynucleotide sequence sets consisting of sets:

36; 86; 104; 107; 117; 132; 144; 153; 156; 174; 191; 209; 248; 349; 350; 396; 417; 419; 432; 558; 566; 613; 623; 625; 633; and 643.

- The detection of the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected in each of predefined polynucleotide sequence sets consisting of sets:

2; 3; 10; 22; 24; 25; 30; 32; 33; 35; 39; 40; 41; 42; 47; 50; 54; 57; 67; 72; 97; 102; 103; 118; 120; 128; 130; 133; 134; 137; 145; 146; 147; 149; 158; 162; 163; 165; 169; 170; 173; 179; 180; 188; 193; 194; 195; 199; 200; 201; 202; 204; 206; 210; 211; 212; 213; 214; 216; 217; 219; 222; 234; 238; 246; 249; 250; 255; 271; 272; 273; 276; 277; 278; 282; 283; 284; 291; 292; 293; 294; 295; 296; 303; 304; 305; 306; 308; 312; 314; 318; 323; 324; 325; 326; 330; 336; 337; 338; 339; 340; 341; 342; 343; 344; 347; 351; 353; 356; 359; 360; 361; 362; 363; 364; 371; 372; 374; 378; 380; 381; 382; 383; 384; 387; 388; 393; 397; 399; 402; 403; 408; 414; 415; 418; 420; 421; 422; 426; 428; 430; 433; 441; 446; 449; 457; 458; 460; 465; 471; 472; 473; 475; 476; 478; 480; 481; 482; 484; 485; 486; 490; 493; 494; 497; 501; 502; 504; 505; 509; 510; 514; 516; 520; 525; 526; 527; 528; 529; 530; 537; 538; 539; 541; 545; 546; 550; 559; 560; 561; 562; 564; 565; 571; 576; 577; 578; 580; 581; 584; 585; 586; 590; 591; 593; 594; 595; 596; 602; 607; 609; 612; 615; 624; 635; 639; 640; and 644.

In a preferred embodiment, the sets for analyzing differential gene expression associated with visceral metastasis can, for example, consist of those mentioned in Table 3:

TABLE 3CloneGeneReferenceSetidentifierclusterSymbolsequencesTitle of clusterSEQ ID Numbers32image: 121076ughs.107476:175;atp5l;nm_006476;atp synthase, h+ transporting,SEQ ID NO: 1681ughs.75275:175ube4anm_004788mitochondrial f0 complex, subunit g;SEQ ID NO: 1682ubiquitination factor e4a (ufd2homolog, yeast)33image: 121265ughs.181315:175Ifnar1nm_000629interferon (alpha, beta and omega)SEQ ID NO: 1683receptor 150image: 129146ughs.423404:175cox7a2lnm_004718cytochrome c oxidase subunit viiaSEQ ID NO: 1684polypeptide 2 like133image: 191714ughs.370504:175;rps15a;nm_001019;ribosomal protein s15a; transcribedughs.486908:175locus, moderately similar toxp_212877.2 ribosomal protein s15a[rattus norvegicus]188image: 240753217image: 258313ughs.432170:175cox7bnm_001866cytochrome c oxidase subunit viibSEQ ID NO: 1685271image: 301119ughs.80691:175ckmt2nm_001825creatine kinase, mitochondrial 2SEQ ID NO: 1686(sarcomeric)284image: 31027ughs.180414:175;hspa8;nm_006597,heat shock 70 kda protein 8; fragile xSEQ ID NO: 1687ughs.52788:175fxr2nm_153201;mental retardation, autosomalSEQ ID NO: 1688nm_004860homolog 2SEQ ID NO: 1689296image: 321973ughs.108957:175rps27lnm_015920ribosomal protein s27-likeSEQ ID NO: 1690303image: 323681ughs.11156:175loc51255nm_016494hypothetical protein loc51255SEQ ID NO: 1691312image: 324757ughs.370504:175rps15anm_001019ribosomal protein s15a323image: 33794ughs.155433:175atp5c1nm_001001973,atp synthase, h+ transporting,SEQ ID NO: 1692nm_005174mitochondrial f1 complex, gammaSEQ ID NO: 1693polypeptide 1340image: 345694ughs.156316:175Dcnnm_001920,decorinSEQ ID NO: 1694nm_133503,SEQ ID NO: 1695nm_133504,SEQ ID NO: 1696nm_133505,SEQ ID NO: 1697nm_133506,SEQ ID NO: 1698nm_133507SEQ ID NO: 1699343image: 346269ughs.420269:175col6a2nm_001849,collagen, type vi, alpha 2SEQ ID NO: 1700nm_058174,SEQ ID NO: 1701nm_058175SEQ ID NO: 1702361image: 358943ughs.438837:175n2nnm_203458similar to notch2 proteinSEQ ID NO: 1703403image: 41411ughs.184582:175;rpl24;nm_000986;ribosomal protein l24; transcribedSEQ ID NO: 1704ughs.206520:175locus408image: 416946ughs.395309:175Txnnm_003329thioredoxinSEQ ID NO: 1705473image: 509588ughs.421646:175taf12nm_005644taf12 rna polymerase ii, tata boxSEQ ID NO: 1706binding protein (tbp)-associatedfactor, 20 kda484image: 510977ughs.173724:175Ckbnm_001823creatine kinase, brainSEQ ID NO: 1707494image: 526038ughs.536668:175transcribed locus502image: 530368ughs.469653:175rpl5nm_000969ribosomal protein l5SEQ ID NO: 1708516image: 544885ughs.469653:175rpl5nm_000969ribosomal protein l5SEQ ID NO: 1708624image: 79829ughs.7888:175erbb4nm_005235v-erb-a erythroblastic leukemia viralSEQ ID NO: 1709oncogene homolog 4 (avian)

According to the lymph node metastasis embodiment, said analysis comprises the detection of the overexpression or the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

38; 55; 66; 91; 93; 102; 103; 133; 142; 144; 153; 163; 190; 210; 232; 254; 280; 296; 300; 304; 311; 321; 335; 378; 383; 384; 420; 425; 429; 432; 468; 473; 487; 516; 519; 544; 553; 573; 577; 578; 585; 587; 589; 592; 605; 608; and 644; preferably from sets 142; 144; 153; 190; 280; 468; 519; 553; and 589.

The analysis can comprise at least one of the following steps:

- The detection of the overexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof selected from each of predefined polynucleotide sequence sets consisting of sets:

55; 66; 144; 153; 432; 553; and 608; preferably 144; 153; and 553.

- The detection of the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

38; 91; 93; 102; 103; 133; 142; 163; 190; 210; 232; 254; 280; 296; 300; 304; 311; 321; 335; 378; 383; 384; 420; 425; 429; 468; 473; 487; 516; 519; 544; 573; 577; 578; 585; 587; 589; 592; 605; and 644, preferably 142; 190; 280; 468; 519; and 589.

In a further preferred embodiment, the sets for analyzing differential gene expression associated with lymph node metastasis can, for example, consist of those mentioned in Table 4:

TABLE 4CloneGeneReferenceSetidentifierClusterSymbolsequencesTitle of clusterSEQ ID Numbers142Image: 198903ughs.418533:175bub3nm_004725bub3 budding uninhibited bySEQ ID NO: 1710benzimidazoles 3 homolog (yeast)144Image: 200521ughs.442936:175oas1nm_002534,2′,5′-oligoadenylate synthetase 1,SEQ ID NO: 1711nm_01681640/46 kdaSEQ ID NO: 1712153Image: 2048801ughs.439109:175ntrk2nm_006180neurotrophic tyrosine kinase,SEQ ID NO: 1713receptor, type 2190Image: 241151ughs.432424:175tpp2nm_003291tripeptidyl peptidase iiSEQ ID NO: 1714280Image: 307094ughs.54609:175gcatnm_014291glycine c-acetyltransferase (2-amino-SEQ ID NO: 17153-ketobutyrate coenzyme a ligase)468Image: 504811ughs.20082:175znf38nm_017715,zinc finger protein 38SEQ ID NO: 1716nm_145914SEQ ID NO: 1717553Image: 592521ughs.446590:175;ppp4r2;nm_174907;protein phosphatase 4, regulatorySEQ ID NO: 1718ughs.534524:175flj10213nm_018029subunit 2; hypothetical proteinSEQ ID NO: 1719flj10213589Image: 68176ughs.179203:175flj10916nm_018271hypothetical protein flj10916SEQ ID NO: 1720

In a further embodiment, the method of the present invention is directed to the analysis of differential gene expression associated with MSI phenotype in colon cancer. In this embodiment, said analysis comprises the detection of the overexpression or the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

29; 48; 56; 62; 71; 77; 82; 109; 112; 135; 136; 154; 157; 166; 167; 186; 220; 226; 236; 237; 239; 240; 242; 244; 253; 260; 277; 290; 297; 348; 358; 375; 376; 404; 407; 412; 416; 424; 431; 450; 451; 452; 462; 474; 477; 479; 486; 498; 511; 521; 533; 534; 535; 542; 572; 619; and 622.

The analysis can comprise at least one of the following steps:

- The detection of the overexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof selected from each of predefined polynucleotide sequence sets consisting of sets:

48; 56; 62; 157; 186; 220; 226; 253; 260; 376; 450; 452; 462; 498; and 511.

- The detection of the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

29; 71; 77; 82; 109; 112; 135; 136; 154; 166; 167; 236; 237; 239; 240; 242; 244; 277; 290; 297; 348; 358; 375; 404; 407; 412; 416; 424; 431; 451; 474; 477; 479; 486; 521; 533; 534; 535; 542; 572; 619; and 622.

In a preferred embodiment, the sets for analyzing differential gene expression associated with MSI phenotype can, for example, consist of those mentioned in Table 5:

TABLE 5CloneGeneReferenceSetidentifierClusterSymbolsequencesTitle of clusterSEQ ID Numbers29Image: 120009Ughs.77578:175usp9xnm_004652,ubiquitin specific protease 9, x-SEQ ID NO: 1721nm_021906linked (fat facets-like, drosophila)SEQ ID NO: 172262image: 136361Ughs.519034:175;tnfsf13nm_003808,transcribed locus; tumor necrosisSEQ ID NO: 1723ughs.54673:175nm_003809,factor (ligand) superfamily, memberSEQ ID NO: 1724nm_153012,12SEQ ID NO: 1725nm_172087,SEQ ID NO: 1726nm_172088,SEQ ID NO: 1727nm_172089SEQ ID NO: 172871image: 143519Ughs.227729:175fkbp2nm_004470,fk506 binding protein 2, 13 kdaSEQ ID NO: 1729nm_057092SEQ ID NO: 1730109image: 159885Ughs.298469:175acenm_000789,angiotensin i converting enzymeSEQ ID NO: 1731nm_152830,(peptidyl-dipeptidase a) 1SEQ ID NO: 1732nm_152831SEQ ID NO: 1733136image: 192581Ughs.437040:175ptpn21nm_007039protein tyrosine phosphatase, non-SEQ ID NO: 1734receptor type 21154image: 205314Ughs.408312:175tp53nm_000546tumor protein p53 (li-fraumeniSEQ ID NO: 1735syndrome)348image: 35072Ughs.76152:175aqp1nm_000385,aquaporin 1 (channel-formingSEQ ID NO: 1736nm_198098integral protein, 28 kda)SEQ ID NO: 1737404image: 41452Ughs.28491:175satnm_002970spermidine/spermine n1-SEQ ID NO: 1636acetyltransferase412image: 42214Ughs.192182:175syknm_003177spleen tyrosine kinaseSEQ ID NO: 1738416image: 430090Ughs.355307:175tnfrsf7nm_001242tumor necrosis factor receptorSEQ ID NO: 1739superfamily, member 7431image: 45831Ughs.279920:175ywhabnm_003404,tyrosine 3-SEQ ID NO: 1740nm_139323monooxygenase/tryptophan 5-SEQ ID NO: 1741monooxygenase activation protein,beta polypeptide451image: 488316Ughs.368256:175ltbp1nm_000627,latent transforming growth factorSEQ ID NO: 1742nm_206943beta binding protein 1SEQ ID NO: 1743479image: 510161Ughs.1600:175cct5nm_012073chaperonin containing tcp1, subunit 5SEQ ID NO: 1744(epsilon)486image: 512000Ughs.411826:175ube2d3nm_003340,ubiquitin-conjugating enzyme e2d 3SEQ ID NO: 1745nm_181886,(ubc4/5 homolog, yeast)SEQ ID NO: 1746nm_181887,SEQ ID NO: 1747nm_181888,SEQ ID NO: 1748nm_181889,SEQ ID NO: 1749nm_181890,SEQ ID NO: 1750nm_181891,SEQ ID NO: 1751nm_181892,SEQ ID NO: 1752nm_181893SEQ ID NO: 1753498image: 530034Ughs.544630:175transcribed locus535image: 549173Ughs.192023:175eif3s2nm_003757eukaryotic translation initiationSEQ ID NO: 1754factor 3, subunit 2 beta, 36 kda622image: 79598Ughs.194657:175cdh1nm_004360cadherin 1, type 1, e-cadherinSEQ ID NO: 1755(epithelial)

In a further preferred embodiment, the sets for analyzing differential gene expression associated with MSI phenotype can, for example, consist of those mentioned in Table 6:

TABLE 6GeneReferenceSetClone identifierClusterSymbolsequencesTitle of clusterSEQ ID Numbers109image: 159885ughs.298469:175Acenm_000789,angiotensin i converting enzymeSEQ ID NO: 1731nm_152830(peptidyl-dipeptidase a) 1SEQ ID NO: 1732nm_152831SEQ ID NO: 1733154image: 205314ughs.408312:175tp53Nm_000546tumor protein p53 (li-fraumeniSEQ ID NO: 1735syndrome)412image: 42214ughs.192182:175SykNm_003177spleen tyrosine kinaseSEQ ID NO: 1738486image: 512000ughs.411826:175ube2d3nm_003340,ubiquitin-conjugating enzyme e2d 3SEQ ID NO: 1745nm_181886(ubc4/5 homolog, yeast)SEQ ID NO: 1746nm_181887SEQ ID NO: 1747nm_181888SEQ ID NO: 1748nm_181889SEQ ID NO: 1749nm_181890SEQ ID NO: 1750nm_181891SEQ ID NO: 1751nm_181892SEQ ID NO: 1752nm_181893SEQ ID NO: 1753535image: 549173ughs.192023:175eif3s2Nm_003757eukaryotic translation initiationSEQ ID NO: 1754factor 3, subunit 2 beta, 36 kda622image: 79598ughs.194657:175cdh1Nm_004360cadherin 1, type 1, e-cadherinSEQ ID NO: 1755(epithelial)

In a further embodiment, the method of the present invention is directed to the analysis of differential gene expression associated with survival and death of patients in colon cancer. In this embodiment, said analysis comprises the detection of the overexpression or the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequences sets consisting of sets:

2; 3; 5; 7; 8; 10; 12; 14; 20; 22; 23; 26; 28; 32; 33; 35; 36; 41; 42; 44; 47; 50; 51; 60; 61; 63; 64; 70; 73; 74; 81; 92; 93; 95; 106; 115; 118; 120; 121; 123; 129; 130; 132; 133; 137; 145; 148; 149; 160; 161; 162; 163; 183; 187; 188; 195; 199; 200; 202; 206; 209; 211; 213; 214; 217; 219; 222; 228; 229; 230; 233; 234; 238; 245; 246; 247; 250; 257; 269; 271; 274; 275; 276; 282; 283; 284; 285; 289; 291; 292; 296; 302; 303; 304; 312; 314; 318; 323; 327; 333; 334; 335; 336; 337; 339; 340; 341; 342; 344; 345; 347; 350; 351; 356; 359; 361; 362; 363; 364; 367; 370; 373; 374; 378; 380; 381; 382; 383; 384; 387; 389; 402; 403; 408; 411; 414; 418; 420; 428; 430; 433; 435; 439; 444; 446; 447; 449; 456; 457; 458; 460; 461; 465; 473; 478; 482; 484; 489; 490; 491; 494; 497; 501; 502; 504; 510; 514; 516; 520; 523; 528; 529; 530; 536; 537; 538; 539; 540; 548; 551; 556; 561; 562; 570; 571; 580; 581; 582; 584; 586; 590; 591; 593; 594; 596; 603; 607; 609; 612; 615; 620; 624; 625; 628; 635; 639; and 640.

The analysis can comprise at least one of the following steps:

- The detection of the overexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof selected from each of predefined polynucleotide sequence sets consisting of sets:

5; 14; 36; 44; 61; 64; 70; 81; 95; 115; 121; 132; 183; 209; 228; 275; 333; 334; 350; 367; 373; 435; 439; 523; 570; 603; and 625.

- The detection of the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

2; 3; 7; 8; 10; 12; 20; 22; 23; 26; 28; 32; 33; 35; 41; 42; 47; 50; 51; 60; 63; 73; 74; 92; 93; 106; 118; 120; 123; 129; 130; 133; 137; 145; 148; 149; 160; 161; 162; 163; 187; 188; 195; 199; 200; 202; 206; 211; 213; 214; 217; 219; 222; 229; 230; 233; 234; 238; 245; 246; 247; 250; 257; 269; 271; 274; 276; 282; 283; 284; 285; 289; 291; 292; 296; 302; 303; 304; 312; 314; 318; 323; 327; 335; 336; 337; 339; 340; 341; 342; 344; 345; 347; 351; 356; 359; 361; 362; 363; 364; 370; 374; 378; 380; 381; 382; 383; 384; 387; 389; 402; 403; 408; 411; 414; 418; 420; 428; 430; 433; 444; 446; 447; 449; 456; 457; 458; 460; 461; 465; 473; 478; 482; 484; 489; 490; 491; 494; 497; 501; 502; 504; 510; 514; 516; 520; 528; 529; 530; 536; 537; 538; 539; 540; 548; 551; 556; 561; 562; 571; 580; 581; 582; 584; 586; 590; 591; 593; 594; 596; 607; 609; 612; 615; 620; 624; 628; 635; 639; and 640.

In a preferred embodiment the sets for analyzing differential gene expression associated with the survival and death of patients may for example consist of those mentioned in Table 7:

TABLE 7GeneReferenceSetClone identifierclusterSymbolsequencesTitle of clusterSEQ ID Numbers10image: 108370ughs.366546:175map2k2nm_030662mitogen-activated protein kinaseSEQ ID NO: 1756kinase 212image: 10839933image: 121265ughs.181315:175ifnar1nm_000629interferon (alpha, beta and omega)SEQ ID NO: 1683receptor 1214image: 257445ughs.77917:175uchl3nm_006002ubiquitin carboxyl-terminal esteraseSEQ ID NO: 175713 (ubiquitin thiolesterase)217image: 258313ughs.432170:175cox7bnm_001866cytochrome c oxidase subunit viibSEQ ID NO: 1685271image: 301119ughs.80691:175ckmt2nm_001825creatine kinase, mitochondrial 2(sarcomeric)344image: 346610ughs.184510:175sfnnm_006142stratifinSEQ ID NO: 1758383image: 37630ughs.300701:175mgc8685nm_178012tubulin, beta polypeptide paralogSEQ ID NO: 1759387image: 376755ughs.24341:175taznm_015472transcriptional co-activator with pdz-SEQ ID NO: 1760binding motif (taz)414image: 428103ughs.1311:175Cd1cnm_001765cd1c antigen, c polypeptideSEQ ID NO: 1761473image: 509588ughs.421646:175taf12nm_005644taf12 rna polymerase ii, tata boxSEQ ID NO: 1706binding protein (tbp)-associatedfactor, 20 kda484image: 510977ughs.173724:175ckbnm_001823creatine kinase, brainSEQ ID NO: 1707516image: 544885ughs.469653:175rp15nm_000969ribosomal protein 15SEQ ID NO: 1708536image: 549178ughs.448580:175;sec611;nm_007277;sec6-like 1 (s. cerevisiae); tyrosine 3-SEQ ID NO: 1762ughs.74405:175ywhaqnm_006826monooxygenase/tryptophan 5-SEQ ID NO: 1763monooxygenase activation protein,theta polypeptide561image: 611623ughs.124979:175;dj159a19.3;nm_020462;hypothetical protein dj159a19.3;SEQ ID NO: 1764ughs.519765:175kiaa1181kiaa1181 protein

In a further embodiment the method of the present invention is directed to the analysis or differential gene expression associated with the location of primary colorectal carcinoma in colon cancer. In this embodiment, said analysis comprises the detection of the overexpression or the underexpression of a pool of polynucleotide sequences in colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected in from of predefined polynucleotide sequence sets consisting of sets:

6; 19; 43; 49; 83; 89; 94; 100; 151; 168; 172; 177; 224; 252; 258; 265; 309; 315; 316; 320; 322; 328; 355; 365; 391; 443; 453; 455; 466; 483; 496; 499; 506; 512; 513; 515; 517; 531; 532; 554; 563; 575; 579; 606; 618; and 637.

The analysis can comprise at least one of the following steps:

- The detection of the overexpression of a pool of polynucleotide sequences in left-colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof selected from each of predefined polynucleotide sequence sets consisting of sets:

19; 43; 89; 94; 100; 168; 224; 309; 328; 355; 391; 466; 531; 532; 563; and 637.

- The detection of the overexpression of a pool of polynucleotide sequences in right-colon tissues, said pool corresponding to all or part of the polynucleotide sequences, subsequences or complements thereof, selected from each of predefined polynucleotide sequence sets consisting of sets:

6; 49; 83; 151; 172; 177; 252; 258; 265; 315; 316; 320; 322; 365; 443; 453; 455; 483; 496; 499; 506; 512; 513; 515; 517; 554; 575; 579; 606; and 618.

In a preferred embodiment, the sets for analyzing differential gene expression associated with the location of the primary colorectal carcinoma can, for example, consist of those mentioned in Table 8:

TABLE 8GeneReferenceSetClone identifierclusterSymbolsequencesTitle of clusterSEQ ID Numbers43image: 124345ughs.77204:175cenpfnm_016343centromere protein f, 350/400 kaSEQ ID NO: 1765(mitosin)100image: 154335ughs.321234:175exosc10nm_001001998,exosome component 10SEQ ID NO: 1766nm_002685SEQ ID NO: 1767151image: 204653ughs.174142:175csf1rnm_005211colony stimulating factor 1 receptor,SEQ ID NO: 1768formerly mcdonough feline sarcomaviral (v-fms) oncogene homolog172image: 22295ughs.343220:175crknm_005206,v-crk sarcoma virus ct10 oncogeneSEQ ID NO: 1769nm_016823homolog (avian)SEQ ID NO: 1770265image: 291448ughs.95972:175silvnm_006928silver homolog (mouse)SEQ ID NO: 1771315image: 325641ughs.534030:175psg5nm_002781pregnancy specific beta-1-SEQ ID NO: 1772glycoprotein 5443image: 47986ughs.149609:175itga5nm_002205integrin, alpha 5 (fibronectinSEQ ID NO: 1652receptor, alpha polypeptide)499image: 530037ughs.244230:175full-length cdna clone cs0di056yj24of placenta cot 25-normalized ofhomo sapiens (human)532image: 549065ughs.169744:175g22p1nm_001469thyroid autoantigen 70 kda (kuSEQ ID NO: 1773antigen)554image: 594120ughs.8364:175pdk4nm_002612pyruvate dehydrogenase kinase,SEQ ID NO: 1774isoenzyme 4

Tables 2 to 8 provide, for each set listed, certain features, some of which are redundant with Table 1 and some of which are additional. For instance, certain reference sequences (“NM_xxxxxx”) in the “Reference Sequences” column of Tables 2 to 8 are supplemental to the sequences mentioned in the “Ref.” column of Table 1. This “Reference Sequences” column provides one or more mRNA references for a specific corresponding gene. These mRNAs, that represent the various splice forms currently identified in the art, are encompassed by the nucleotide sequence sets listed in Tables 2 to 8. Each of these mRNAs can be considered as a marker in the meaning of the present invention. The use of the “NM_xxxxxx” references herein would be clearly understood by a person skilled in the art who is familiar with this type of referencing system. The sequences corresponding to each “NM_xxxxxx” reference (or corresponding splice forms) are available, e.g., in the OMIM and LocusLink databases (NCBI web site) and are incorporated herein by reference. An “NM_xxxxxx” reference is therefore a constant; i.e., it will always designate the same sequence over time and whatever the source (database, printed document, or the like).

Each set described herein comprises sequence(s) mentioned in Table 1 and, in addition, can comprise the “NM_XXXXXX” sequence and splice form(s) thereof mentioned in Tables 2 to 8 for each same set. For example, the sequences that comprise Set 1 are SEQ ID No. 1, 2 (of Table 1) and nm_—001747 sequence (of Table 2), including subsequences, or complements thereof, as described previously. In case of redundancy between the “Ref.” column of Table 1 and the “References Sequences” column of Tables 2 to 8 (i.e., if a “NM_XXXXXX” reference sequence corresponds to a SEQ ID sequence already mentioned in “Ref” column of Table 1), only one of these sequences may be considered.

The present invention further relates to a polynucleotide library useful for the molecular characterization of a colon cancer, comprising or corresponding to a pool of polynucleotide sequences which are either overexpressed or underexpressed in one or more of the above-cited tissues (e.g., colon tissue) said pool corresponding to all or part of the polynucleotide sequences (or markers) selected as defined above.

The detection of over or under expression of polynucleotide sequences according to the method of the invention can be carried out by fluorescence in-situ hybridization (FISH) or immuno histochemical (IHC), methods. Such detection can be performed on nucleic acids from a tissue sample, e.g., from one or more of the above-cited tissues, e.g., colorectal tissue sample, or from a tumor cell line.

The invention also relates particularly to a method performed on DNA or cDNA arrays; e.g., DNA or cDNA microarrays.

The detection of over or under expression of polynucleotide sequences according to the method of the invention can also be carried out at the protein level. Such detections are performed on proteins expressed from nucleic acid in one or more of the above-cited tissue samples.

Accordingly, a further method according to the present invention comprises:

a) obtaining a sample comprising proteins from a colorectal tissue sample from a subject; and

b) measuring in said sample obtained in step (a) the level of those proteins encoded by a polynucleotide library according to the invention.

The present invention is useful for detecting, diagnosing, staging, classifying, monitoring, predicting, and/or preventing colorectal cancer. It is particularly useful for predicting clinical outcome of colon cancer and/or predicting occurrence of metastatic relapse and/or determining the stage or aggressiveness of a colorectal disease in at least about 50%, e.g., at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or about 100% of the subjects. The invention is also useful for selecting a more appropriate dose and/or schedule of chemotherapeutics and/or biopharmaceuticals and/or radiation therapy to circumvent toxicities in a subject.

By “aggressiveness of a colorectal disease” is meant, e.g., cancer growth rate or potential to metastasize; a so-called “aggressive cancer” will grow or metastasize rapidly or significantly affect overall health status and quality of life.

By “predicting clinical outcome” is meant, e.g., the ability for a skilled artisan to classify subjects into at least two classes (good vs. poor prognosis) showing significantly different long-term Metastasis Free Survival (MFS).

In particular, the method of the invention is useful for classifying cell or tissue samples from subjects with histopathological features of colorectal disease, e.g., colon tumor or colon cancer, as samples from subjects having a “poor prognosis” (i.e., metastasis or disease occurred within 5 years since diagnosis) or a “good prognosis” (i.e., metastasis- or disease-free for at least 5 years of follow-up time since diagnosis).

The present invention further relates to a method of assigning a therapeutic regimen to subject with histopathological features of colorectal disease, for example colon cancer, comprising:

a) classifying said subject having a “poor prognosis” or a “good prognosis” on the basis of the method of analysing according to the present invention;

b) assigning said subject a therapeutic regimen, said therapeutic regimen (i) comprising no adjuvant chemotherapy if the subject is lymph node negative and is classified as having a good prognosis, or (ii) comprising chemotherapy if said subject has any other combination of lymph node status and expression profile.

For example, the assigning of a therapeutic regimen can comprise the use of an appropriate dose of irinotecan drug compound. For example, this dose is selected according to the presence or the absence of a polymorphism(s) in a uridine diphosphate glucuronosyltransferase I (UGT1A1) gene promoter of the subject. For example, a polymorphism may be the presence of an abnormal number of (TA) repeats in said UGT1A1 promoter.

More generally, the invention is also useful for selecting appropriate doses and/or schedules of chemotherapeutics and/or (bio)pharmaceuticals, and/or targeted agents, which can include irinotecan, 5-fluorouracil, fluorouracil, levamisole, mitomycin, lomustine, vincristine, oxaliplatin, methotrexate, and anti-thymidilate synthase. Further relevant anti-colorectal cancer agents are known in the art. These agents may administered alone or in combination.

The method for analyzing differential gene expression associated with histopathologic features of colorectal disease according to the present invention, e.g., the method for classifying cell or tissue samples, allows one to achieve high specificity and/or sensitivity levels of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%.

By “specificity” is meant:

Number of true negative samples×100/(Number of true negative samples+Number of false positive samples)

By “sensitivity” is meant:

Number of true positive samples×100/(Number of true positive samples+Number of false negative samples)

With reference to the figures:

FIG. 1 shows global gene expression profiles in colorectal cancer and non-cancerous samples. 1A—Hierarchical clustering of 50 samples and ˜9,000 cDNA clones based on mRNA expression levels. Each row represents a clone and each column represents a sample. Expression level of each gene in a single sample is relative to its median abundance across all samples and depicted according to a color scale shown at the bottom. Red and green indicate expression levels above and below the median, respectively. The magnitude of deviation from the median is represented by the color saturation. Grey indicates missing data. Dendrogram of samples (above matrix) and genes (to the left of matrix) represent overall similarities in gene expression profiles. For samples, black branches represent normal tissues (n=23), red branches represent cancer tissues (n=22) and purple branches represent cancer cell lines (n=5). Colored bars to the right indicate the locations of 7 gene clusters of interest. These clusters, except the “proliferation cluster” (brown bar), are zoomed in B. 1B—Top panel: dendrogram of samples: tissue samples are designated with numbers followed by N when non-cancerous tissue and T when tumor tissue. Lower panel: expanded view of selected gene clusters named from top to bottom: “MHC class II”, “stromal”, “MHC class I”, “interferon-related”, “early response”, “smooth muscle” and “proliferation”. Genes are referenced by their HUGO abbreviation as used in “Locus Link”. 1C—Dendrogram of samples representing the results of the same hierarchical clustering applied only to the 22 cancer tissue samples. Two groups of samples (A and B) are defined. Sample names and branches highlighted in blue and in red represent patient samples without and with metastatic disease at diagnosis (labelled by *) or during follow-up, respectively. Status of each patient at last follow-up is marked by A (alive) or D (deceased)from CRC.

FIG. 2 shows hierarchical classification of tissue samples using genes which discriminate between normal and cancer samples. 2A—Hierarchical clustering of the 45 colon tissue samples using expression levels of the 245 cDNA clones were significantly different between normal and cancer samples. Dendrogram of these samples are magnified in B. 2B—Dendrogram of samples: black branches represent normal tissues (n=23) and red branches represent cancer tissues (n=22).

FIG. 3 shows hierarchical classification of CRC tissue samples using genes that discriminate metastatic from non-metastatic samples, correlated with survival. 3A—Hierarchical clustering of the 22 CRC tissue samples based on expression levels of the 244 cDNA clones was significantly different between metastatic and non-metastatic cancer samples. Dendrogram of samples is zoomed in B. 3B—Dendrogram of samples: blue represents samples without metastasis and red represents samples with metastasis at diagnosis (labelled by *) or during follow-up. A means alive at last follow-up and D means dead, from CRC. The analysis delineates 2 groups of tumors, group 1 and group 2. 3C—Kaplan-Meier plots of metastasis-free survival and overall survival of the 2 groups of samples defined by hierarchical clustering for all patients (left, n=22) and AJCC 1-3 patients (right, n=16).

FIG. 4 shows hierarchical classification of CRC tissue samples using discriminator genes selected by supervised analyses based on lymph node status, MSI phenotype and location of tumors. 4A—Hierarchical clustering of the 21 CRC tissue samples based on expression levels of the 46 cDNA clones significantly different between lymph node-positive (LN+, n=5, red branches and names) and lymph node-negative (LN−, n=16, blue branches and names) cancer samples. Each gene is identified by IMAGE cDNA clone number, HUGO abbreviation, and chromosomal location. EST means expressed sequence tag for clones without significant identity to a known gene or protein. 4B—Hierarchical clustering of the 22 CRC tissue samples based on expression levels of the 58 cDNA clones significantly different between MSI+ (MSI, n=8, blue branches and names) and non-MSI (n=14, red branches and names) cancer samples. 4C—Hierarchical clustering of the 22 CRC tissue samples based on expression levels of the 46 cDNA clones was significantly different between cancer samples from right colon (R, n=6, blue branches and names) and left colon (L, n=13, red branches and names).

FIG. 5 shows analysis of NM23 protein expression in colorectal tissue samples using tissue microarrays. Protein expression of NM23 was analysed using tissue microarrays containing 190 pairs of cancer samples and corresponding normal mucosa. 5A—Hematoxylin & Eosin staining of a paraffin block section (25x30) from a tissue microarray containing 216 tumors (3×55) and control samples. 5B—Five-μm sections of 0.6 mm core biopsies of cancer colorectal samples stained with anti-NM23 antibody are shown. Sections e and f are from CRC patients without metastasis (strong staining) and Sections g and h are from CRC patients with metastasis (low staining). 5C—Kaplan-Meier plots of overall survival in AJCC1-3 patients according to NM23 protein expression levels. Magnification is 50× in B-E.

EXAMPLE

The invention will now be illustrated with the following non-limiting examples.

1) Gene expression profiling of CRC and unsupervised classification

The mRNA expression profiles of 50 cancer and non-cancerous colon samples, including 45 clinical tissue samples and 5 cell lines, were determined using DNA microarrays containing ˜9,000 spotted PCR products from known genes and ESTs. Both unsupervised and supervised analyses were performed on all samples following normalization of expression levels.

Unsupervised hierarchical clustering of all samples based on the total gene expression profile was first applied. Results were displayed in a color-coded matrix (FIG. 1A) where samples were ordered on the horizontal axis and genes on the vertical axis on the basis of similarity of their expression profiles. The 50 samples were sorted into two large clusters that extensively differed with respect to normal or cancer type (FIG. 1B, top): 87% were non-cancerous in the left cluster and 87% were cancerous in the right cluster. As expected, the CRC cell lines represented a branch of the “cancer” cluster. Hierarchical clustering also allowed identification of clusters of gene expression corresponding to defined functions or cell types, some of which are indicated by colored bars on the right of FIG. 1A, and which are zoomed in FIG. 1B. Three clusters are overexpressed in tissue samples overall as compared to epithelial cell lines, reflecting the cell heterogeneity of tissues: an “immune cluster” with different subclusters including a MHC class I subcluster that correlated with an interferon-related subcluster, a MHC class II subcluster, which is a “stromal cluster” enriched with genes expressed in stromal cells (COL1A1, COL1A2, COL3A1, MMP2, TIMP1, SPARC, CSPG2, PECAM, INHBA), and a “smooth muscle cluster” (CNN1, CALD1, DES, MYH11, SMTN, TAGL) that was globally overexpressed in normal tissue as compared to cancer tissues. An “early response cluster” included immediate-early genes (JUNB, FOS, EGR1, NR4A1, DUSP1) involved in the human cellular response to environmental stress. Conversely, a very large cluster, defined as a “proliferation cluster”, was generally overexpressed in cell lines as compared to tissues, probably reflecting the proliferation rate difference between cells in culture and tumor tissues. This cluster included PCNA that codes for a proliferation marker used in clinical practice, as well as many genes involved in: glycolysis, such as GAPD, LDHA, ENO1; cell cycle and mitosis, such as CDK4, BUB3, CDKN3, GSPT2; metabolism, such as ALDH3A1, cytochrome C oxidase subunits, and GSTP1, and protein synthesis such as genes coding for ribosomal proteins.

The same clustering algorithm applied only to the 22 CRC clinical samples sorted two groups of tumors (A, 10 patients and B, 12 patients) that differed with respect to AJCC stage and clinical outcome (FIG. 1C). Group A included a high proportion of patients presenting with metastases at diagnosis (AJCC4 stage, 5 out of 10) as compared with group B (1 out of 12). Interestingly, 3 out of 5 “AJCC1-3” patients of group A experienced metastatic relapse after a median duration of 18 months (range, 4 to 88) from diagnosis and died from CRC, while none of the 11 “AJCC1-3” patients of group B relapsed or died after a median follow-up of 69 months (range, 10 to 98). This suggests that patients are at higher risk for metastasis in group A than in group B. To identify particular sets of genes that could better define subgroups of samples, supervised analyses were then conducted.

2) Differential gene expression between normal colon and colon tumors

To identify and rank genes with significant differential expression between cancer (22 samples) and non-cancerous colon tissues (23 samples), a discriminating score (DS) combined with iterative random permutation tests was applied. Two hundred forty-five cDNA clones, 130 of which were overexpressed and 115 were underexpressed in cancer samples, were identified. These clones corresponded to 237 unique sequences that represented 191 different known genes and 46 ESTs. The function of the known genes, as given in the OMIM and LocusLink databases (NCBI web site), are listed in Table. 1 above. Samples were then reclustered on the basis of these genes (FIG. 2), with a good resulting discrimination between normal and cancer samples: in the left branch 90% of samples were cancerous, while in the large right branch 92% were normal.

3) Differential gene expression within CRC tissue samples

A supervised approach was applied to the 22 cancer tissue samples by comparing tumor subgroups defined by relevant histoclinical parameters.

3.a) Genes associated with visceral metastases

The occurrence of metastasis is the leading cause of death in patients with CRC. Accurate predictors of metastasis are needed to determine therapeutic strategies and improve survival. Two hundred forty-four cDNA clones, corresponding to 235 unique sequences representing 194 characterized genes and 41 ESTs, were identified that discriminated between primary tumor samples collected from patients with and without metastasis at time of diagnosis or during follow-up. Among these clones, 219 were underexpressed and 25 were overexpressed in metastatic samples as compared to non-metastatic samples. Hierarchical clustering of samples based on expression of these selected genes (FIGS. 3A-B) successfully classified patients according to outcome, with only two non-metastatic samples misplaced in the group 2. Significantly, differences of survival between the two groups were statistically significant (FIG. 3C). The 5-year MFS (Metastatic Free Survival) and OS (Overall Survival) were 100% for group 1 (n=11) and 18% and 30%, respectively, for group 2 (n=11) (p=0.0001 and p=0.001). MFS and OS were 100% for group 1 (n=11) and 40% for the group 2 (n=5) when only patients without metastatic disease at time of diagnosis (AJCC1-3 stage) were considered (p=0.005 and p=0.006, respectively). Finally, MFS and OS were 100% for group 1 (n=10) and 50% for the group 2 (n=4) when only AJCC1-2 patients (no metastatic disease and node-negative tumor at time of diagnosis) were considered (p=0.019 and p=0.022, respectively).

3.b) Genes associated with lymph node metastases

Pathological lymph node involvement at diagnosis is a strong prognostic parameter in CRC. Its determination relies on surgical dissection, which currently requires biopsy of individual lymph nodes. Surgical lymph-node biopsy has major disadvantages, such as patient discomfort and the fact that metastases, particularly micrometastases, are often missed by surgical biopsy. Lymph node involvement is dependent on the heterogenous expression, and complex interaction(s) of these genes, to promote metastatic invasion and clinical outcome. Large-scale expression analyses provide a solution to identify these genes and the complexity of their interactions to drive tumorigenesis and metastatic invasion, as reported for breast or gastric cancers.

Forty-six cDNA clones (41 known genes and 5 ESTs) were identified as significantly differentially expressed between tumors with (n=5) and without (n=16) lymph node metastasis. Reclustering based on these 46 genes correctly separated node-positive from node-negative samples (FIG. 3A). The two samples (9075T and 7442T) that, among all node-negative cases, had expression patterns more closely related to node-positive samples, displayed metastatic disease at time of diagnosis (7442T) and 23 months after surgery (9075T), corroborating the predictions based on molecular signature.

3.c) Genes associated with MSI phenotype and with location of cancer

To obtain additional insights in colorectal oncogenesis, differential gene expression between MSI+(n=8) and non-MSI (n=14) tumors and between tumors from right colon (n=6) and left colon (n=13) were analyzed.

Fifty-eight cDNA clones (representing 51 known genes and 5 ESTs) with significant differential expression between MSI+ and non-MSI tumors were identified. The discriminator potential of these clones was confirmed by hierarchical classification of samples based on their expression levels, even if some MSI+ tumors displayed an intermediate expression profile (FIG. 4B). Similarly, classification of 19 samples (excluding transverse colon tumors), based on the expression of 46 cDNA genes (35 known genes and 11 ESTs) differentially expressed between right and left colon cancers, correctly sorted samples from the right or left colon (FIG. 4C). Such discrimination agreed with the existence of two distinct categories of CRC according to the location of tumor

3.d) Immunohistochemistry on tissue microarrays.

The protein expression levels of the most significant discriminatory genes identified by supervised analyses on TMA's containing 190 pairs of cancer samples and corresponding normal mucosa were measured. Use of TMA allowed the measurement of the expression levels simultaneously and in identical conditions. IHC results using an anti-NM23 antibody (which detects both NMEI and NME2 proteins)are shown in FIG. 5. Consistent with DNA microarray results, NM23 was significantly overexpressed in cancer samples as compared to non-cancerous samples (p=5.6×10⁻⁶, Fisher exact test), and was significantly down-regulated in tumors with metastasis (cut-off was the median value) compared to tumors without metastasis (p=0.04, Fisher exact test). The 5-year MFS was 68% for negative and 88% for positive samples when considering the 111 AJCC1-3 patients with available IHC data (p=0.02, log-rank test). Conversely, no such correlation, identified using DNA microarrays, was found for the protein expression levels of prohibitin and decorin.

4) Discussion

DNA microarray-based gene expression profiling is a promising approach to investigate the molecular complexity of cancer. To date, CRC studies have not directly addressed the issue of prognosis or MSI phenotype. Fifty cancer and non-cancerous colon tissue samples was profiled and expression profiles were correlated with histoclinical parameters of disease, including survival, using both unsupervised and supervised analyses.

4a) Unsupervised analysis

Global gene expression profile revealed extensive transcriptional heterogeneity between samples, notably cancer samples. It was to some extent already able to distinguish clinically relevant subgroups of samples: normal versus cancer tissues as previously reported, notably for CRC, and good versus poor prognosis tumors. Such global classification is usually imperfect because of the excessive noise generated by large gene sets that mask the identification of signicant discriminatory genes (such as clinical outcome) governed by a smaller set. Importantly, described global approach allows identification of discrete expression patterns to define clinical useful classification among patients with CRC: for example, several gene clusters that correspond to cell types (stroma, smooth muscle, MHC class I and II) or function (interferon-related, immediate-early response and proliferation) that have been reported in previous studies were identified; hence the validity of the present data consistent with putative biologic function.

4b) Supervised analyses

To identify smaller sets of discriminator genes that may improve classification of samples and facilitate translation in clinical practice, supervised statistical analyses were done, based on predefined groups of samples.

i) Comparison of normal vs cancer samples.

A total of 245 discriminator cDNA clones (3%) were significantly differentially expressed between normal and cancer samples. This ratio is in agreement with those reported in the literature. Comparison with lists of discriminator genes previously identified in CRC using DNA microarrays revealed many common genes, further underlying the validity of the present data. For example, CA4, CHGA, CNN1, MYH11, FCGBP, KCNMB1, SST were down-regulated, whereas CA3, CCT4, EIF3S6 or EEF1A1, IFITM1, CSE1L, NME1 or RAN were up-regulated in cancer samples. Beyond these common genes, many additional genes to improve the accuracy of previously described predictive signatures were identified.

Among the underexpressed genes in cancer samples were genes encoding cytokines (IL10RA, IL1RN, IL2RB), proteins involved in lipid metabolism (LPP, LIAS, LRP2, MGLL), signal transducers (PLCD1, PLCG2, mTOR/FRAP1), transcription factors such as RELA, and known or putative tumor suppressor genes (TSG). CTCF encodes a transcriptional repressor of MYC and is located in 16q22.1, a chromosomal region frequently deleted in breast and prostate tumors; IRF1, a transcriptional activator of genes induced by cytokines and growth factors, regulates apoptosis and cell proliferation and is frequently deficient in human cancers. The underexpression of GSN (gelsolin), combined with that of PRKCB1 (protein kinase C, beta 1), may lead to decreased activation of PKCs involved in phospholipid signalling pathways that inhibit cell proliferation and tumorigenicity.

The top-ranked gene overexpressed in cancer samples was GNB2L1 (also named RACK1) that encodes a beta polypeptide 2-like 1 of a guanine nucleotide binding protein (G protein) involved in signal transduction and activation of PKC. It also interacts with IGF1R, shown to play a pivotal role in colorectal oncogenesis; this interaction may regulate IGF1-mediated AKT activation and protection from cell death as well as IGF1-dependent integrin signalling and promote cell extravasion and contact with extracellular matrix (ECM). Other genes have already been reported as up-regulated in other types of cancer: they encode SNRPs and SOX transcription factors (SNRPC, SNRPE, SOX4, SOX9), components of ECM, and molecules involved in vascular and extracellular remodelling (COL5A1, P4HA1, MMP13, LAMR1). BZRP, that codes for the peripheral benzodiazepine receptor, cell cycle genes (CCNB2, CDK2), and SAT, involved in polyamine metabolism were also identified. Consistent with previous reports, we identified the overexpression in cancer samples of SERPINB5 and NME1, encoding two potential TSGs. Overexpression of NME1 combined with underexpression of CTCF interacts to induce overexpression of the MYC oncogene, an important modulator of WNT/APC signalling shown to play an important role in the development of CRC. Other up-regulated genes, and potential therapeutic targets, include kinases (PTK2, STK6, NTRK2), the cell-surface protein CD9, and three genes encoding integrins ITGA2, ITGAL and ITGB3. The integrin pathway was further affected with variations in the expression of genes encoding PTK2, TGFB1I1/HIC5 (a PTK2 interactor), and integrin-linked kinase ILK. Agrawal et al. previously identified osteopontin, an integrin-binding protein as a marker of CRC progression. SPP1 that codes for osteopontin, as well as CXCL1 which codes for GRO1 oncogene or CDK4, were not in the present stringent list of discriminator genes, although overexpressed in cancer samples with a fold-change greater or equal to 2.

Discriminator genes were associated with many cell structures, processes and functions, including general metabolism (the most abundant category), cell cycle, proliferation, apoptosis, adhesion, cytoskeletal remodelling, signal transduction, transcription, translation, RNA and protein processing, immune system and others. Up- and down-regulated genes were rather equally distributed with respect to these functions, except for those coding for kinases and for proteins involved in extracellular matrix remodelling, metabolism, RNA and protein processing (translation, ribosomal proteins and chaperonins), which were overexpressed in cancer samples as compared to normal samples. This phenomenon, already reported, is likely to be related to increased metabolism and cell proliferation in cancer cells.

Analysis of chromosomal location point to two interesting regions. Six genes up-regulated in cancer (STK6, UBE2C, PFDN4, RPS21, CSE1L, SLPI) were located in 20q13, a chromosomal region often amplified in cancer; their overexpression might be a consequence of gene amplification. This has already been observed by others, although not all genes of the region are affected transcriptionally. Conversely, six genes (TJP3, INSR, ELAVL1, MAP2K7, CNN1, NR2F6) down-regulated in cancer samples were located in 19p13.1-p13.3, already known to harbour several potential TSG such as APC2, STK11 or MCC2.

ii) Expression profiles and clinical outcome

All subjects, some of them presenting with metastasis at diagnosis, had received standard treatment. Significantly, the described method for global hierarchical clustering from subjects with non-metastatic tumors that clustered with metastatic cases eventually developed metastasis and died during follow-up. Supervised analysis further improved the prognostic classification by identifying 194 known genes and 41 ESTs that well discriminated between samples without or with metastasis at diagnosis or during follow-up. This is the first report that suggests a potential prognostic role of gene expression profiling in CRC. The significance of the prognostic classification made by AJCC stage and by expression levels of the present discriminator gene sets were compared. Classification based on AJCC stage (AJCC1-2 tumors, n=14, vs AJCC3-4 tumors, n=8) was significant (p=0.001; Kaplan-Meier survival analysis, log-rank test), but less than that made by expression profiles (Fisher's exact test, p=0.05 vs p=0.003). Significantly, the prognostic impact of our gene set was also confirmed when applied to patients without metastasis at diagnosis as well as to patients without metastasis and lymph node invasion.

In addition, the functional identities of the discriminator genes provided insight into the underlying molecular mechanism that drive the metastatic process, and contributed to the identification of potential novel therapeutic targets. For example, known genes that were down-regulated in metastatic tumors were DSC2, encoding desmocollin 2, a desmosomal and hemi-desmosomal adhesion molecule of the cadherin family, HPN, coding for hepsin, a transmembrane serine protease the favorable prognostic role of which has been recently highlighted in prostate cancer by studies using DNA and/or tissue microarrays. Decorin is a small leucine-rich proteoglycan abundant in ECM that negatively controls growth of colon cancer cells and angiogenesis. Low levels of mRNA have been associated with a worse prognosis in breast carcinomas. NME1 and NME2 were underexpressed in patients that developed metastasis, consistent with previous reports that these genes interacted to suppress metastasis. Prohibitin is a mitochondrial protein thought to be a negative regulator of cell proliferation and may be a TSG. Transcription of genes encoding mitochondrial proteins has been shown to be decreased during progression of CRC. This was confirmed in the present study, since all discriminator genes involved in mitochondrial metabolism were down-regulated in metastatic tumors (ATP5C1, BCKDK, CABC1, CKMT2, COX5B, COX6B, COX7A2, COX7A2L, COX7C, HSPA9B, LRIG1, MDH1, NDUFA1, NDUFA4, NDUFA6, NDUFA9, NDUFV1, SCO1, UQCR). Surprisingly, although increased protein synthesis is classically associated with oncogenic transformation, we found many genes coding for ribosomal proteins (RPL5, RPL6, RPL15, RPL29, RPL31, RPL39) were found that were down-regulated in metastatic tumors. The SMAD1/AMDH1 gene codes for a transmitter of TGFalpha signalling, which exerts a number of regulatory effects on colon cells and is involved in the metastatic process. The most significantly overexpressed genes in metastatic tumors were PCSK7, which codes for the proprotein convertase subtilisin/kexin type 7. Proprotein convertases (PCs) process latent precursor proteins into their biologically active products, including protein tyrosine phosphatases, growth factors and their receptors, and enzymes like matrix metalloproteases (MMPs), that may confer on them a functional role in the tumor cell invasion and tumor progression. Other up-regulated genes encoded various signalling proteins including PRAME, an interactor of the cytoskeleton-regulator paxillin, IQGAP1, a negative regulator of the E-cadherin/catenin complex-based cell-cell adhesion, LTPB4, a structural component of connective tissue microfibrils and local regulator of TGFβ tissue deposition and signalling, IGF1R, a transmembrane tyrosine kinase receptor, and DSG1, another desmosomal cadherin-like protein. The incorrect balance between the various desmosomal cadherins has been shown to facilitate separation of epithelial from the ECM and metastasis. IGF1R has been recently shown as involved in metastases of CRC by preventing apoptosis, enhancing cell proliferation, and inducing angiogenesis. Several genes located on the long arm of chromosome 15 were down-regulated in metastatic samples.

iii) Expression profiles and lymph node metastasis

Although nodal metastasis is currently the standard clinical method to predict patient prognosis, there is clear consensus that an improved diagnostic is required to accurately predict survival for patients with CRC. However, approximately one-third of node-negative CRC recur, possibly due to understaging and inadequate pathological examination of lymph nodes. Statistical models suggest that the mean number of nodes currently identified in patients is much too low to correctly classify nodal status. Expression profiles defined in primary tumors could help predict the presence of lymph node metastasis, as recently reported. Forty-six genes and ESTs were identified as discriminators between node-positive and node-negative tumors. Since lymph node status and metastatic relapse are correlated events, this invention includes the identification of novel genes that discriminate between tumors with or without metastasis.

For example, OAS1 and NTRK2 were overexpressed in node-positive tumors. NTRK2 encodes a neurotrophic tyrosine kinase, and aberrant mutation of NTRK2 has recently been shown to play a role in the metastastic process. OAS1 encodes the 2′,5′-oligoadenylate synthetase 1; the 2-5A system has been implicated in the control of cell growth, differentiation, and apoptosis. High levels of activity have been reported in individuals with disseminated cancer, and a recent study found overexpression of OAS1 mRNA in node-positive breast cancers. Conversely, MGP, PRSS8 and NME2 were down-regulated in node-positive tumors. MGP encodes the matrix G1a protein, the loss of expression of which has been associated with lymph node metastasis in urogenital tumors. The prostasin serine protease, encoded by PRSS8, is a potential invasion suppressor, and down-regulation of PRSS8 expression may contribute to invasiveness and metastatic potential. The present list of 46 discriminator clones also included additional genes, reflecting the non-perfect correlation between lymph node metastasis and visceral metastasis and the involvement of different underlying biological processes.

Among genes underexpressed in node-positive tumors were BUB3, TPP2 and ITIH1. BUB3 codes for a mitotic-spindle checkpoint protein that interacts with the APC protein to regulate chromosome segregation during cell division. Defects in mitotic checkpoints, including mutations of BUB1, have been associated with CRC and BUB genes (BUB1 and BUB1B) are underexpressed in highly metastatic colon cell lines. TPP2, encodes tripeptidyl peptidase II, a high molecular mass serine exopeptidase that may play a functional role by degrading peptides involved in invasive and metastatic potential as recently reported for another peptidyl peptidase DPP4. ITIH 1, encodes a heavy chain of proteins of the ITI family, that inhibits the metastatic spreading of H460M large cell lung carcinoma lines by increasing cell attachment.

iv) Expression profiles and MSI phenotype

Without wishing to be bound by any theory, it is believed that there are at least two distinct pathways of oncogenesis in sporadic CRC. Fifteen per cent of tumors present the MSI phenotype, which is related to the inactivation of MMR genes, principally MSH2 and MLH1. The genetically unstable tumor cells accumulate somatic clonal mutations in their genome, which may disturb mRNA expression or degradation of specific transcripts. Conversely, 85% of sporadic tumors are associated with a non-MSI (or MSS) phenotype; they are characterized by chromosome instability and loss of genomic material that may count for the loss of expression of specific alleles. MSI+ tumors are frequently diploid, located in the proximal colon, and may be associated with better prognosis and response to chemotherapy. Reliable distinction between MSI+ and non-MSI phenotypes, currently based on molecular approaches, remains problematic and difficult to assess/confirm in the clinical setting; largely due to the number and heterogeniety of genes involved, absence of easily identifiable mutationional hot-spots, and epigenetic inactivation. Other methods are being tested such as IHC assessment of MSH2 and MLH1

Although the underlying molecular mechanisms of MSI+ and non-MSI colorectal oncogenesis remain unclear, it appears that these two phenotypes represent different molecular entities that could translate into distinct gene expression profiles useful in clinical practice as new diagnostic markers and/or tests. The present supervised analysis of MSI+ and non-MSI CRC clinical samples showed 58 differentially expressed clones. It is of note that arrayed MMR genes (MSH2, MSH3, MLH1, MLH3, PMS1 and PMS2) were not among these discriminator genes. As reported for cell lines, several of these deregulated genes are involved in cell cycle control, mitosis, transcription and/or chromatin structure (RAN, PTPN21, TP53, MORF4L1, ZFP36L2, PSEN1, IGF2, ASNS, RPS4X, CCNF, ZNF354A). The top down-regulated gene in MSI+ tumors was EIF3S2, that encodes the eukaryotic translation initiation factor 3, and subunit 2β, also known as TRIP1 (TGFalpha receptor-interacting protein 1). TRIP1 specifically associates with TGFBRII, a serine/threonine kinase receptor frequently inactivated by mutation and down-regulated in MSI+ tumors.

v) Validation studies

Many different cell processes are aberantly modulated during colorectal oncogenesis. Genes involved in adhesion processes are affected in metastasis. Genes known to be affected in oncogenesis, such as MMR genes, do not discriminate tumor subgroups. DNA microarray data could prove rapidly useful in clinical practice and design of new therapeutic options. The described DNA micro-array approach may be ideally suited to elucidate the complex and heterogeneous processes that drive CRC progression in individual patients, significantly improve clinical treatment of CRC, and optimize the use of novel therapeutic options. Discriminator genes represent potential new diagnostic and prognostic markers and/or therapeutic targets, and deserve further investigation in larger series of subjects. Novel markers of potentially differentially expressed molecules were identified using IHC on TMA containing 190 pairs of cancer samples and corresponding normal mucosa. TMA confirmed the correlations between NM23 expression level and two clinical parameters: non-cancerous or cancer status and survival of patients. Expression was higher in cancer samples, and low expression was significantly associated with a shorter MFS. Such correlation has been described in a variety of malignant tumors, including breast, ovarian, lung or gastric cancers as well as melanoma. However, this correlation remains controversial in CRC, with positive and negative reports. The present invention allowed measurement of the expression levels simultaneously and under highly standardized conditions for all the 190 CRC samples, representing one of the largest series of CRC samples tested for NM23 IHC. 0 As previously described, correlation between protein and mRNA levels would not be expected in all cases. This was the case for Decorin and Prohibitin.

vi) Conclusion.

The data presented in this nonlimiting Examples section shows that mRNA expression profiling of CRC using DNA microarrays provides for identification of clinically relevant tumor subgroups, defined upon combined expression of genes. The genes delineated in this invention can contribute to the understanding of CRC development and progression, and may lead to improved and new diagnostic and/or prognostic markers, identify new molecular targets for novel anticancer drugs, and may also lead to significant improvements in CRC management.

V—Materials and Methods used in the above Examples

1) Colorectal cancer patients and samples

A total of 50 samples including 45 tissue samples and 5 cell lines were profiled using DNA microarrays. The 45 colon tissue samples were obtained from 26 unselected patients with sporadic colorectal adenocarcinoma who underwent surgery at the Institut Paoli-Calmettes (Marseille, France) between 1990 and 1998. Samples were macrodissected by pathologists, and frozen within 30 min of removal in liquid nitrogen for molecular analyses. All tumor samples contained more than 50% tumor cells. The 45 samples included 22 cancer samples and 23 normal samples divided into 19 tumor-normal pairs (based on availability of a sample of the corresponding normal colonic mucosa), 3 tumors and 4 normal specimens provided from different patients. All tumor sections and medical records were de novo reviewed prior to analysis. MSI phenotype of 22 cancer samples was determined by PCR amplification using BAT-25 and BAT-26 oligonucleotide primers, and by IHC using anti-MSH2 and MLH1 antibodies. BAT-25 and BAT-26 are mononucleotide repeat microsatellites: a polyA²⁶sequence located in the fifth intron of MSH2 for BAT-26, and located in an intron of the KIT gene for BAT-25. Tumors with alterations in both BAT markers were classified as MSI+. No attempt was made to further classify tumors into MSI-high and MSI-low phenotype. Main characteristics of patients and tumors are listed in Table 9. After colonic surgery, subjects were treated (delivery of chemotherapy or not) according to standard guidelines. After completion of therapy, subjects were evaluated at 3-month intervals for the first 2 years and at 6-month intervals thereafter. Search for metastatic relapse included clinical examination and blood tests completed by yearly chest X-ray and liver ultrasound and/or CT scan.

Five samples were represented by 2 different sporadic colon cancer cell lines with chromosomal instability phenotype, Caco2 and HT29. Three samples represented Caco2 in a differentiated state (named Caco2A, 2B and 2C)—i.e. at confluence (C), at C+10 days, at C+21 days—and one sample represented undifferentiated Caco2 (named Caco2D). Cell lines were obtained from the American Type Culture Collection and grown as recommended.

TABLE 9Characteristics of cancer samples profiled using DNA microarraysMSIOutcomePatientSexAgeLocationGradepT UICCpN UICCAJCC StagestatusTreatment(months)7650M74descending colonGpT3pN14 (liver)MSIpS + pCTAWC 48582F80ascending colonPpT3pN34 (liver)MSIpSD 17442M64transverse colonGpT3pN14 (liver)MSSpS + pCTD 328208M40transverse colonMpT3pN24 (liver)MSScS + adj CTD 417835F72transverse colonGpT3pN34 (liver)MSSpS + pCTD 178656F57descending colonGpT3pN24 (liver)MSScS + adj CTAWC 668031F46descending colonGpT3pN23MSScS + adj CTMR 4 - D 76927M71descending colonGpT3NANAMSScS + adj CTNED 109118F75ascending colonGpT3pN12MSIcS + adj CTNED 568904M80descending colonGpT3pN12MSIcSNED 186974M68ascending colonPpT3pN12MSIcS + adj CTNED 978646M74descending colonGpT3pN12MSScSNED 638458M56descending colonGpT3pN12MSScS + adj CTNED 696992F65ascending colonGpT3pN12MSScS + adj CTNED 987094F87descending colonGpT3pN12MSScSNED 648252F54rectumGpT4pN12MSScS + adj CTNED 749075F45ascending colonGpT2pN11MSIcSMR23 - D387505M71ascending colonGpT1pN11MSIcSNED 887043M70descending colonGpT2pN11MSScSNED 976952M58descending colonGpT2pN11MSScSNED 657597F72rectumGpT2pN11MSScSNED 877815M63rectumGpT2pN11MSIcSMR 10 - D 40

For the IHC study on Tissue Micro Array (TMA), a consecutive series of 191 sporadic CRC patients (including the 26 cases studied by DNA microarrays) treated between 1990 and 1998 at the Institut Paoli-Calmettes was selected. The study included 98 men and 92 women. The median age of patients at diagnosis was 64 years, (range, 29 to 97 years). In 58% of the cases, tumors were located in the distal part of the large bowel or sigmoid, 29% in the proximal part, and 13% in the rectum.

TABLE 10Characteristics of cancer samples profiled using tissue microarrays.CharacteristicsAll patients (n = 191)Sex (M/F)99/92Median age, years (range)64 (29-97)Location of tumorascending colon47transverse colon9descending colon110rectum21na4Gradegood127poor50na14pT UICC1162213127427pN UICC188248354Na1Vascular invasionno115yes68na8AJCC stage*129251343468Surgery191curative/palliative131/59 na1Chemotherapy109adjuvant/palliative60/49no chemotherapy80na2Median follow-up, months (range)74 (2, 133)Metastatic evolution95metastatic relapse*27progression**68Death from CRC90
Legend:

M, male;

F, female;

na, not available;

pT, pathological staging of primary tumor;

UICC, International Union Against Cancer;

pN, pathological staging of regional lymph nodes;

AJCC, American Joint Committee on Cancer;

*AJCC1-3 patients;

**AJCC4 patients;

CRC, colorectal cancer.

2) RNA extraction

Total RNA was extracted from frozen tumor samples by using standard guanadinium isothiocynanate and cesium chloride gradient techniques. RNA integrity was controlled by denaturing formaldehyde agarose gel electrophoresis and 28-S Northern blots before labelling.

3) DNA microarray preparation

Gene expression analyses were performed with home-made Nylon microarrays containing 8,074 spotted cDNA clones, representing 7,874 IMAGE human cDNA clones and 200 control clones. According to the 155 Unigene release, the IMAGE clones were divided into 6,664 genes and 1,210 ESTs. All clones were PCR-amplified in 96-well microtiter plates (200 μl). Amplification products were desiccated and resuspended in 50 μl of distilled water. They were then spotted as previously described onto Hybond-N+2×7 cm²membranes (Amersham) adhered to glass slides, using a 64-pin print head on a MicroGridII microarrayer (Apogent Discoveries, Cambridge, England). All membranes used in this study belonged to the same batch.

4) DNA microarray hybridizations

Microarrays were hybridized with ³³P-labeled probes: first with an oligonucleotide sequence common to all spotted PCR products (called “vector hybridization” to precisely determine the amount of target DNA accessible to hybridisation in each spot) and then, after stripping, with complex probes made from 2 μg of retrotranscribed total RNA. Probe preparations, hybridizations and washes were done as previously described and available from the website maintained by TAGC ERM206 (INSERM) under the heading “Materials and Methods, ” the entire disclosure of which is herein incorporated by reference. After the washing steps, arrays were exposed to phosphor-imaging plates that were then scanned with a FUJI BAS 5000 machine (25 μm resolution). Hybridization signals were quantified using ArrayGauge software (Fuji Ltd, Tokyo, Japan).

5) Data analysis

Signal intensities were normalized for the amount of spotted DNA and the variability of experimental conditions (FB HMG99). Complex probe intensity of each spot (C) was first corrected (C/V) for the amount of target DNA accessible to hybridization as measured using vector hybridisation (V). When V intensity of a spot was too weak on a microarray, the corresponding cDNA clone was not considered for this experiment. Then, to minimize experimental differences between different complex probe hybridizations, C/V values from each hybridization were divided by the corresponding median value of C/V.

Unsupervised hierarchical clustering analysis then allowed the investigation of relationships between samples and between genes. This analysis was applied to data log-transformed and median-centred on genes using the Cluster and TreeView program (average linkage clustering using Pearson correlation as similarity metric). Supervised analysis was also used to identify and rank genes that distinguished between two subgroups of samples defined by an interesting histoclinical parameter. A discriminating score (DS) was calculated for each gene as DS=(M1−M2)/(S1+S2), where M1 and S1 respectively represent mean and standard deviation of expression levels of the gene in subgroup 1, and M2 and S2 in subgroup 2. Confidence levels were estimated by bootstrap resampling.

Statistical analyses were done using the SPSS software (version 10.0.5). Metastasis-free survival (MFS) and overall survival (OS) were measured from diagnosis until, respectively, the date of the first distant metastasis and the date of death from CRC. Survivals were estimated with the Kaplan-Meier method and compared between groups with the Log-Rank test. Data concerning patients without metastatic relapse or death at last follow-up were censored, as well as deaths from other causes. A p-value <0.05 was considered significant.

6) Tissue microarrays (TMA) construction

The technique of TMA allowed the analysis of tumors and their respective normal mucosa simultaneously and under identical experimental conditions for the 190 subjects. TMA were prepared as described above, with slight modifications. For each sample, three representative sample areas were carefully selected from a hematoxylin-eosin stained section of a donor block. Core cylinders with a diameter of 0.6 mm each were punched from each of these areas and deposited into three separate recipient paraffin blocks, using a specific arraying device (Beecher Instruments, Silver Spring, Md.). In addition to pairs of tumor and normal mucosa, the recipient block also received control tissue (small intestine, adenomas) and cell lines pellets. Five-μm sections of the resulting TMA block were made and used for IHC analysis after transfer onto glass slides. Two colon tumor cell lines (CaCo-2, HT29) and one gastric tumor cell line (HGT1) were used as controls.

7) Immunohistochemical analysis

Anti-NM23 rabbit polyclonal antibody was purchased from Dako (Dako, Trappes, France) and used at 1:100 dilution. IHC was carried out on five-μm sections of tissue fixed in alcohol formalin for 24 h and included in paraffin. Sections were deparaffinized in histolemon (Carlo Erba Reagenti, Rodano, Italy), and were rehydrated in graded alcohol. Antigen enhancement was done by incubating the sections in target retrieval solution (Dako) as recommended by the manufacturer. The reactions were carried out using an automatic stainer (Dako Autostainer). Staining was done at room temperature as follows: after washes in phosphate buffer, followed by quenching of endogenous peroxidase activity by treatment with 3% H₂O₂, slides were first incubated with blocking serum (Dako) for 30 min and then with the affinity-purified antibody for one hour. After washes, slides were incubated with biotinylated antibody against rabbit IgG for 20 min., followed by streptadivin-conjugated peroxydase (Dako LSAB^R2 kit). Diaminobenzidine or 3-amino-9-ethylcarbazole was used as the chromogen. Slides were counter-stained with hematoxylin, and coverslipped using Aquatex (Merck, Darmstadt, Germany) mounting solution. The slides were evaluated under a light microscope by two pathologists. The results were expressed in terms of percentage (P) and intensity (I) of positive cells as previously described: results were scored by the quick score (Q) (Q=P×I). For the TMA, the mean of the score of two core biopsies minimum was done for each case. Correlations between status of sample (non-cancerous or cancer, and cancer with or without metastasis) or Kaplan-Meier MFS curves and IHC data were investigated by using Fisher exact test and Log-Rank test. Statistical tests were two-sided at the 5% level of significance.

References

Agrawal D, Chen T, Irby R, Quackenbush J, Chambers A F, Szabo M, Cantor A, Coppola D and Yeatman T J. (2002). J Natl Cancer Inst, 94, 513-521.

Alizadeh A A, Eisen M B, Davis R E, Ma C, Lossos I S, Rosenwald A, Boldrick J C, Sabet H, Tran T, Yu X, Powell J I, Yang L, Marti G E, Moore T, Hudson J, Jr., Lu L, Lewis D B, Tibshirani R, Sherlock G, Chan W C, Greiner T C, Weisenburger D D, Armitage J O, Warnke R, Botstein D, Brown P O and Staudt L M. (2000). Nature, 403, 503-511.

Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D and Levine A J. (1999). Proc Natl Acad Sci U S A, 96, 6745-6750.

Backert S, Gelos M, Kobalz U, Hanski M L, Bohm C, Mann B, Lovin N, Gratchev A, Mansmann U, Moyer M P, Riecken E O and Hanski C. (1999). Int J Cancer, 82, 868-874.

Beer D G, Kardia S L, Huang C C, Giordano T J, Levin A M, Misek D E, Lin L, Chen G, Gharib T G, Thomas D G, Lizyness M L, Kuick R, Hayasaka S, Taylor J M, Iannettoni M D, Orringer M B and Hanash S. (2002). Nat Med, 8, 816-824.

Bertucci F, Houlgatte R, Nguyen C, Viens P, Jordan B R and Birnbaum D. (2001). Lancet Oncol, 2, 674-682.

Bertucci F, Nasser V, Granjeaud S, Eisinger F, Adelaide J, Tagett R, Loriod B, Giaconia A, Benziane A, Devilard E, Jacquemier J, Viens P, Nguyen C, Birnbaum D and Houlgatte R. (2002). Hum Mol Genet, 11, 863-872.

Birkenkamp-Demtroder K, Christensen L L, Olesen S H, Frederiksen C M, Laiho P, Aaltonen L A, Laurberg S, Sorensen F B, Hagemann R and T F O R. (2002). Cancer Res, 62, 4352-4363.

Devilard E, Bertucci F, Trempat P, Bouabdallah R, Loriod B, Giaconia A, Brousset P, Granjeaud S, Nguyen C, Birnbaum D, Birg F, Houlgatte R and Xerri L. (2002). Oncogene, 21, 3095-3102.

Fearon E R and Vogelstein B. (1990). Cell, 61, 759-767.

Frederiksen C M, Knudsen S, Laurberg S and T F O R. (2003). J Cancer Res Clin Oncol, 15, 15.

Garber M E, Troyanskaya O G, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen G D, Perou C M, Whyte R I, Altman R B, Brown P O, Botstein D and Petersen I. (2001). Proc Natl Acad Sci U S A, 98, 13784-13789.

Kitahara O, Furukawa Y, Tanaka T, Kihara C, Ono K, Yanagawa R, Nita M E, Takagi T, Nakamura Y and Tsunoda T. (2001). Cancer Res, 61, 3544-3549.

Lin Y M, Furukawa Y, Tsunoda T, Yue C T, Yang K C and Nakamura Y. (2002). Oncogene, 21, 4120-4128.

Mohr S, Leikauf G D, Keith G and Rihn B H. (2002). J Clin Oncol, 20, 3165-3175.

Notterman D A, Alon U, Sierk A J and Levine A J. (2001). Cancer Res, 61, 3124-3130.

Singh D, Febbo P G, Ross K, Jackson D G, Manola J, Ladd C, Tamayo P, Renshaw A A, D'Amico A V, Richie J P, Lander E S, Loda M, Kantoff P W, Golub T R and Sellers W R. (2002). Cancer Cell, 1, 203-209.

Tureci O, Ding J, Hilton H, Bian H, Ohkawa H, Braxenthaler M, Seitz G, Raddrizzani L, Friess H, Buchler M, Sahin U and Hammer J. (2003). Faseb J, 17, 376-385.

Vogelstein B, Fearon E R, Hamilton S R, Kern S E, Preisinger A C, Leppert M, Nakamura Y, White R, Smits A M and Bos J L. (1988). N Engl J Med, 319, 525-532.

Williams N S, Gaynor R B, Scoggin S, Verma U, Gokaslan T, Simmang C, Fleming J, Tavana D, Frenkel E and Becerra C. (2003). Clin Cancer Res, 9, 931-946.

Zou T T, Selaru F M, Xu Y, Shustova V, Yin J, Mori Y. Shibata D, Sato F, Wang S, Olaru A, Deacu E, Liu T C, Abraham J M and Meltzer S J. (2002). Oncogene, 21, 4855-4862.

Gene expression profiling of colon cancer with DNA arrays

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Parent Case Info

Provisional Applications (1)