Methods for predicting cancer outcome and gene signatures for use therein

Information

  • Patent Application
  • 20060195266
  • Publication Number
    20060195266
  • Date Filed
    February 25, 2005
    19 years ago
  • Date Published
    August 31, 2006
    17 years ago
Abstract
The present invention pertains to specific gene signatures for cancer that are used to predict survival and novel processes for identifying such gene signatures. In one embodiment, gene signatures for human colorectal cancer are identified and outcomes are linked to the specific gene signatures using significance analysis of microarrays (SAM) and support vector machines (SVM) to provide a prognosis/survival classifier.
Description
BACKGROUND OF THE INVENTION

In the last decade, scientists have labored to complete a high-quality, comprehensive sequence of the human genome. With its recent completion, a large number of genomic data sets have been made available in public databases. The available data, however, does not provide explanations regarding which aspects of human biology affect which genes. Researchers are just beginning to explore genomic function.


Several technological advances have made it possible to accurately measure cellular constituents and therefore derive profiles. For example, new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see, for example, Schena et al., “Quantitative monitoring of gene expression patterns with a complementary DNA micro-array,” Science, 270:467-470 (1995); Lockhart et al., “Expression monitoring by hybridization to high-density oligonucleotide arrays,” Nature Biotechnology, 14:1675-1680 (1996); and Blanchard et al., “Sequence to array: Probing the genome's secrets,” Nature Biotechnology, 14:1649 (1996)). In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms, such as humans, for which there is an increasing knowledge regarding the genome, it is possible to simultaneously monitor large numbers of the genes within the cell.


One aspect of human biology/genomic function that is of great interest to the medical research community is cancer. Currently, genetic samples have been taken from patients having various stages of various types of cancer. Such samples have provided an extensive genetic data collection. To provide a system of organization, such genetic data are collected in DNA microarrays, which are sometimes commonly referred to as biochips, DNA chips, gene arrays, gene chips, and genome chips.


DNA microarrays exploit a phenomenon known as base-pairing or hybridization. To form the array, genetic samples are arranged in an orderly manner (typically in a rectangular grid) on a substrate. Examples of commonly used substrates include microplates and blotting membranes. Many modern microarrays include an array of oligonucleotide or peptide nucleic acid (PNA) probes, and the array is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array on the chip is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined.


There are two major uses of DNA microarray technology. The first involves identification of the gene sequence. The second involves determination of expression level of genes, generally referred to as the abundance of the genes. In particular, expression or abundance of a gene is a measure of a relative level of activity of the gene in replication or translation in the presence of the probe. By analyzing the abundance of various genes in people of various conditions, a relationship between the genetic state of a person, in terms of relative levels of activity of various genes of that person, and that person's condition is assessed. To conduct such analysis, such arrays of expression levels include metadata describing characteristics of the people whose genetic material is sampled and additional metadata which identifies specific genes whose expression levels are represented in such arrays.


The use of microarrays are already being used for a number of beneficial purposes including, for example, identifying biomarkers of cancer (Welsh, J B et al., “Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum,” PNAS, 100(6):3410-3415 (March 2003)), creating gene expression-based classifications of cancers (Alzadeh, A A et al., “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, 403:513-11 (2000); and Garber, M E et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9 (2001)), and in drug discovery (Marton, M J et al., “Drug target validation and identification of secondary drug target effects using Microarrays,” Nat Med, 4(11):1293-301 (1998); and Gray, N S et al., “Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors,” Science, 281:533-538 (1998)). One tool that has been applied to microarrays to decipher and compare genome expression patterns in biological systems is Significance Analysis of Microarrays, or SAM (Tusher, V. et al., “Significance analysis of microarrays applied to ionizing radiation response,” Proceedings of the National Academy of Sciences, 2001. First published Apr. 17, 2001, 10.1073/pnas.091062498). This statistical method was developed as a cluster tool for use in identifying genes with statistically significant changes in expression. SAM has been used for a variety of purposes, including identifying potential drugs that would be effective in treating various conditions associated with specific gene expressions (Bunney W E, et al., “Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders,” Am J Psychiatry, 160(4):657-66 (April 2003)).


The known SVM or (Support Vector Machine) (as described in Michael P. et al., “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences, 97(1):262-67 (2000)) is a correlation tool shown to perform well in multiple areas of biological analysis, including evaluating microarray expression data (Brown et al, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proc Natl Acad Sci USA, 97:262-267 (2000)), detecting remote protein homologies (Jaakkola, T. et al., “Using the Fisher kernel method to detect remote protein homologies,” Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, Calif. (1999)), and recognizing translation initiation sites (Zien, A. et al., “Engineering support vector machine kernels that recognize translation initiation sites,” Bioinformatics, 16(9):799-807 (2000)). When used for classification, SVMs separate a given set of binary labeled training data with a hyper-plane that is maximally distant from set of data (the “maximal margin hyper-plane”). Where no linear separation is possible, SVMs utilize the technique of “kernels” to automatically realize a non-linear mapping to a feature space (Furey, T. S. et al., “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, 16(10):906-914 (2000)).


Ranked as the third most commonly diagnosed cancer and the second leading cause of cancer deaths in the United States (American Cancer Society, “Cancer facts and figures,” Washington, D.C.: American Cancer Society (2000)), colon cancer is a deadly disease afflicting nearly 130,000 new patients yearly in the United States. Colon cancer is the only cancer that occurs with approximately equal frequency in men and women. There are several potential risk factors for the development of colon and/or rectal cancer. Known factors for the disease include older age, excessive alcohol consumption, sedentary lifestyle (Reddy, B. S., “Dietary fat and its relationship to large bowel cancer,” Cancer Res., 41:3700-3705 (1981)), and genetic predisposition (Potter, J D “Colorectal cancer: molecules and populations,” J Natl Cancer Institute, 91:916-932 (1999)).


Several molecular pathways have been linked to the development of colon cancer (see, for example, Leeman M F, et al., “New insights into the roles of matrix metalloproteinases in colorectal cancer development and progression,” J Pathol., 201(4):528-34 (2003); Kanazawa, T et al., “Does early polypoid colorectal cancer with depression have a pathway other than adenoma-carcinoma sequence?,” Tumori., 89(4):408-11 (2003); and Notarnicola, M. et al., “Genetic and biochemical changes in colorectal carcinoma in relation to morphologic characteristics,” Oncol Rep., 10(6):1987-91 (2003)), and the expression of key genes in any of these pathways may be affected by inherited or acquired mutation or by hypermethylation. A great deal of research has been performed with regard to identifying genes for which changes in expression may provide an early indicator of colon cancer or a predisposition for the development of colon cancer. Unfortunately, no research has yet been conducted on identifying specific genes associated with colorectal cancer and specific outcomes to provide an accurate prediction of prognosis.


Survival of patients with colon and/or rectal cancer depends to a large extent on the stage of the disease at diagnosis. Devised nearly seventy years ago, the modified Dukes' staging system for colon cancer, discriminates four stages (A, B, C, and D), primarily based on clinicopathologic features such as the presence or absence of lymph node or distant metastases. Specifically, colonic tumors are classified by four Dukes' stages: A, tumor within the intestinal mucosa; B, tumor into muscularis mucosa; C, metastasis to lymph nodes and D, metastasis to other tissues. Of the systems available, the Dukes' staging system, based on the pathological spread of disease through the bowel wall, to lymph nodes, and to distant organ sites such as the liver, has remained the most popular. Despite providing only a relative estimate for cure for any individual patient, the Dukes' staging system remains the standard for predicting colon cancer prognosis, and is the primary means for directing adjuvant therapy.


The Dukes' staging system, however, has only been found useful in predicting the behaviour of a population of patients, rather than an individual. For this reason, any patient with a Dukes A, B, or C lesion would be predicted to be alive at 36 months while a patient staged as Dukes D would be predicted to be dead. Unfortunately, application of this staging system results in the potential over-treatment or under-treatment of a significant number of patients. Further, Dukes' staging can only be applied after complete surgical resection rather than after a pre-surgical biopsy.


Microarray technology, as described above, has permitted development of multi-organ cancer classifiers (Giordano, T. J. et al., “Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles,” Am J Pathol, 159:1231-8 (2001); Ramaswamy, S. et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” Proc Natl Acad Sci USA, 98:15149-54 (2001); and Su, A. I. et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Res, 61:7388-93 (2001)), identification of tumor subclasses (Dyrskjot, L. et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003); Bhattacharjee, A. et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” Proc Natl Acad Sci USA, 98:13790-5 (2001); Garber, M. E. et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9. (2001); and Sorlie, T. et al., “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” Proc Natl Acad Sci USA, 98:10869-74 (2001)), discovery of progression markers (Sanchez-Carbayo, M. et al., “Gene Discovery in Bladder Cancer Progression using cDNA Microarrays,” Am J Pathol, 163:505-16 (2003); and Frederiksen, C M, et al., “Classification of Dukes' B and C colorectal cancers using expression arrays,” J Cancer Res Clin Oncol, 129:263-71 (2003)); and prediction of disease outcome (Henshall, S M et al., “Survival analysis of genome-wide gene expression profiles of prostate cancers identifies new prognostic targets of disease relapse,” Cancer Res, 63:4196-203 (2003); Shipp, M A et al., “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nat Med, 8:68-74 (2002); Beer, D G et al., “Gene-expression profiles predict survival of patients with lung adenocarcinoma,” Nat Med, 8:816-24 (2002); Pomeroy, S L et al., “Prediction of central nervous system embryonal tumor outcome based on gene expression,” Nature, 415:436-42 (2002); van 't Veer, L J et al., “Gene expression profiling predicts clinical outcome of breast cancer: Nature, 415:530-6. (2002); Vasselli, J R et al., “Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor,” Proc Natl Acad Sci USA, 100:6958-63 (2003); and Takahashi, M. et al., “Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification,” Proc Natl Acad Sci USA, 98:9754-9 (2001)) in many types of cancer.


Classification of patient prognosis by microarray analysis has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis. Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures, at the time of diagnosis, which can direct the biological behaviour of the tumor over time. To date, however, little success has been achieved in developing a classifier that will predict colon cancer outcome equivalent to or better than that which is possible using the standard clinicopathologic staging systems (i.e., Dukes' stage system). What is needed is a particularly effective mechanism for analyzing genomic array data to provide a classifier that accurately predicts cancer outcomes, in particular, colon cancer outcomes.


BRIEF SUMMARY OF THE INVENTION

The present invention provides systems and methods for predicting outcomes in patients diagnosed with cancer. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier that provides a means for accurately predicting colon cancer outcome.


In accordance with an aspect of the invention, genes are classified according to degree of correlation with a clinical outcome for a cancer of interest (such as colon cancer). These genes are used to establish a set of reference gene expression levels (also referred to herein as a “classifier”). Biological information regarding the patient is received and used to extrapolate intracellular gene expression. The intracellular gene expression levels are compared to those in the classifier to predict clinical outcome.


In one embodiment of the invention, a method is provided in which the specific gene signatures for colon cancer are identified. To do so, frozen tumor specimens form patients with known outcomes are collected and frozen. The outcomes are linked to a specific core set of genes that are weighted in importance by (1) selecting genes of interest by applying microarray analysis; (2) producing a classifier using support vector machines (SVM); and (3) cross-validating the genes of interest and the classifier by comparing them against an independent set of test data. In a preferred embodiment, significance analysis of microarrays (SAM) is utilized to select genes of interest.


Genome wide microarray analyses can produce large datasets that can be pattern-matched to clinicopathologic parameters such as patient outcomes and prognosis. Accordingly, the subject invention identifies gene expression signatures that would predict colon cancer outcome more accurately than the well-accepted Dukes' staging system.


In one embodiment, a group of colon cancer patients was examined to develop a survival classifier, which was subsequently validated using an entirely independent test set of data derived on a different microarray platform at a different performance site. The classifier of the subject invention was ultimately based on a core set of genes selected for their correlation to survival. A number of the genes in the core set demonstrated intrinsic biological significance for colon cancer progression.


With the ability to predict cancer outcomes/prognosis using the subject invention, appropriate treatment protocols can be selected for patients. For example, patients assessed using the subject invention and identified to have poor outcomes may be treated more aggressively or with specific agents (i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.). Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.




DESCRIPTION OF THE FIGURES

The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee.



FIG. 1A is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when correlated with prognosis/patient survival.



FIG. 1B is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when grouped by Dukes' stage B and C.



FIG. 2A graphically illustrates a Kaplan-Meier survival curve based on gene expression profiling in accordance with the present invention.



FIG. 2B graphically illustrates a Kaplan-Meier survival curve based on Dukes' staging.



FIGS. 3A-3C illustrate survival curves for molecular classifiers in accordance with the subject invention.




DETAILED DISCLOSURE OF THE INVENTION

The present invention provides systems and methods for predicting cancer prognosis and outcomes. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier for predicting cancer outcomes/prognosis. Both microarray analysis and binary classification are used to create the classifier of the invention.


The subject invention provides methods for predicting patient outcomes comprising: identifying genes that correlate with a clinical outcome for a cancer of interest (such as colon cancer); establishing a set of reference gene expression levels (also referred to herein as a “classifier”) for said identified genes; receiving biological information regarding the patient; using the biological information to extrapolate intracellular gene expression; and comparing intracellular gene expression levels to those in the classifier to predict clinical outcome.


Biological information of the invention includes, but is not limited to, clinical samples of bodily fluids or tissues; DNA profile information; and RNA profile information. Methods for preparing clinical samples for gene expression analysis are well known in the art, and can be carried out using commercially available kits.


In one embodiment, the subject invention provides methods for predicting colon cancer patient outcomes using a SAM selected set of genes derived from a genome wide analysis of gene expression. Those patients with good and bad prognoses are first clustered into groups that suggest outcome-rich information that is likely present in the gene expression dataset. Subsequently, a supervised SVM analysis identifies a core set of genes that appears in a majority (i.e., 50% or greater, including for example, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%) of the cross validation folds and accurately predicts colon cancer survival. Preferably, a core set of genes that appears in 75% of the cross validation folds is identified by an SVM to be used in predicting colon cancer survival.


In one embodiment, a gene core set is derived from a cDNA microarray that includes both named and unnamed genes. The resultant gene set is highly accurate in predicting cancer survival when compared with Dukes staging data from the same patients. To validate a cDNA-based classifier of the subject invention, a normalized and scaled oligonucleotide-based cancer database is evaluated against a completely independent set of test data derived from a different microarray platform.


Accordingly, the subject invention provides a system for predicting clinical outcome in a patient diagnosed with cancer, wherein the system is useful in offering support/advice in making treatment decisions. The system comprises (1) a data storage device for collecting data (i.e., gene data); and (3) a computing means for receiving and analyzing data to accurately determine genes associated with poor or good patient prognosis. A graphical user interface can be included with the systems of the invention to display clinical data as well as enable user-interaction.


In one embodiment, the system of the invention further includes an intelligence system that can use the analyzed clinical data to classify gene samples and offer support/advice for making clinical decisions (i.e., to interpret predicted clinical outcome and provide appropriate treatment). An intelligence system of the subject invention can include, but is not limited to, artificial neural networks, fuzzy logic, evolutionary computation, knowledge-based systems, and artificial intelligence.


In accordance with the subject invention, the computing means is preferably a digital signal processor, which can automatically and accurately analyze gene data and determine those genes that strongly correlate to clinical outcome.


In one embodiment, the system of the subject invention is stationary. For example, the system of the invention can be used within a healthcare setting (i.e., hospital, physician's office).


Definitions


As used herein, the term “patient” refers to humans as well as non-human animals including, and not limited to, mammals, birds, reptiles, amphibians, and fish. Preferred non-human animals include mammals (i.e., mouse, rat, rabbit, monkey, dog, cat, primate, pig). A patient may also include transgenic animals. In certain embodiments, a patient may be a laboratory animal raised by humans in a controlled environment other than its natural habitat.


The term “cancer,” as used herein, refers to a malignant tumor (i.e., colon or prostate cancer) or growth of cells (i.e., leukaemia). Cancers tend to be less differentiated than benign tumors, grow more rapidly, show infiltration, invasion, and destruction, and may metastasize. Cancer include, and are not limited to, colon and rectal cancers, fibrosarcoma, myxosarcoma, antiosarcoma, leukaemia, squamous cell carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, and hepatocellular carcinoma.


A “marker gene,” as used herein, refers to any gene or gene product (i.e., protein, peptide, mRNA) that indicates a particular clinicopathological state (i.e., carcinoma, normal dysplasia and outcomes) or indicates a particular cell type, tissue type, or origin. The expression or lack of expression of a marker gene may indicate a particular physiological and/or diseased state of a patient, organ, tissue, or cell. Preferably, the expression or lack of expression may be determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene chip analysis, etc. In certain particular embodiments, the level of expression of a marker gene is quantifiable.


The term “polynucleotide” or “oligonucleotide,” as used herein, refers to a polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (i.e., 2-aminoadensoine, 2-thio-thymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2-thiocytidine), chemically modified bases, biologically modified bases (i.e., methylated bases), intercalated bases, modified sugars (i.e., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose), or modified phosphate groups (i.e., phosphorothioates and 5′-N-phosphoramidite linkages).


As used herein, the term “tumor” refers to an abnormal growth of cells. The growth of the cells of a tumor typically exceeds the growth of normal tissue and tends to be uncoordinated. The tumor may be benign (i.e., lipoma, fibroma, myxoma, lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or malignant (i.e., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, colon cancer, lung cancer, etc.).


The term “bodily fluid,” as used herein, refers to a mixture of molecules obtained from a patient. Bodily fluids include, but are not limited to, exhaled breath, whole blood, blood plasma, urine, semen, saliva, lymph fluid, meningal fluid, amniotic fluid, glandular fluid, sputum, feces, sweat, mucous, and cerebrospinal fluid. Bodily fluid also includes experimentally separated fractions of all of the preceding solutions or mixtures containing homogenized solid material, such as feces, tissues, and biopsy samples.


Computing Means


Correlating genes to clinical outcomes in accordance with the subject invention can be performed using software on a computing means. The computing means can also be responsible for maintenance of acquired data as well as the maintenance of the classifier system itself. The computing means can also detect and act upon user input via user interface means known to the skilled artisan (i.e., keyboard, interactive graphical monitors) for entering data to the computing system.


In one embodiment, the computing means further comprises means for storing and means for outputting processed data. The computing means includes any digital instrumentation capable of processing data input from the user. Such digital instrumentation, as understood by the skilled artisan, can process communicated data by applying algorithm and filter operations of the subject invention. Preferably, the digital instrumentation is a microprocessor, a personal desktop computer, a laptop, and/or a portable palm device. The computing means can be general purpose or application specific.


The subject invention can be practiced in a variety of situations. The computing means can directly or remotely connect to a central office or health care center. In one embodiment, the subject invention is practiced directly in an office or hospital. In another embodiment, the subject invention is practiced in a remote setting, for example, personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, wherein the patient is located some distance from the physician.


In a related embodiment, the computing means is a custom, portable design and can be carried or attached to the health care provider in a manner similar to other portable electronic devices such as a portable radio pr computer.


The computing means used in accordance with the subject invention can contain at least one user-interface device including, but not limited to, a keyboard, stylus, microphone, mouse, speaker, monitor, and printer. Additional user-interface devices contemplated herein include touch screens, strip recorders, joysticks, and rollerballs.


Preferably, the computing means comprises a central processing unit (CPU) having sufficient processing power to perform algorithm operations in accordance with the subject invention. The algorithm operations, including the microarray analysis operations (such as SAM or binary classification), can be embodied in the form of computer processor usable media, such as floppy diskettes, CD-ROMS, zip drives, non-volatile memory, or any other computer-readable storage medium, wherein the computer program code is loaded into and executed by the computing means. Optionally, the operational algorithms of the subject invention can be programmed directly onto the CPU using any appropriate programming language, preferably using the C programming language.


In certain embodiments, the computing means comprises a memory capacity sufficiently large to perform algorithm operations in accordance with the subject invention. The memory capacity of the invention can support loading a computer program code via a computer-readable storage media, wherein the program contains the source code to perform the operational algorithms of the subject invention. Optionally, the memory capacity can support directly programming the CPU to perform the operational algorithms of the subject invention. A standard bus configuration can transmit data between the CPU, memory, ports and any communication devices.


In addition, as understood by the skilled artisan, the memory capacity of the computing means can be expanded with additional hardware and with saving data directly onto external mediums including, for example, without limitation, floppy diskettes, zip drives, non-volatile memory and CD-ROMs.


Further, the computing means can also include the necessary software and hardware to receive, route and transfer data to a remote location.


In one embodiment, the patient is hospitalized, and clinical data generated by a computing means is transmitted to a central location, for example, a monitoring station or to a specialized physician located in a different locale.


In another embodiment, the patient is in remote communication with the health care provider. For example, patients can be located at personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, and by using the classifier system of the invention, still provide clinical data to the health care provider. Advantageously, mobile stations, such as ambulances, and mobile clinics, can monitor patient health by using a portable computing means of the subject invention when transporting and/or treating a patient.


To ensure patient privacy, security measures, such as encryption software and firewalls, can be employed. Optionally, clinical data can be transmitted as unprocessed or “raw” signal(s) and/or as processed signal(s). Advantageously, transmitting raw signals allows any software upgrades to occur at the remote location where a computing means is located. In addition, both historical clinical data and real-time clinical data can be transmitted.


Communication devices such as wireless interfaces, cable modems, satellite links, microwave relays, and traditional telephonic modems can transfer clinical data from a computing means to a healthcare provider via a network. Networks available for transmission of clinical data include, but are not limited to, local area networks, intranets and the open internet. A browser interface, for example, NETSCAPE NAVIGATOR or INTERNET EXPLORER, can be incorporated into communications software to view the transmitted data.


Advantageously, a browser or network interface is incorporated into the processing device to allow the user to view the processed data in a graphical user interface device, for example, a monitor. The results of algorithm operations of the subject invention can be displayed in the form of interactive graphics.


Dukes' Staging as a Classifier


Since Dukes' staging describes the survival of a population of patients, rather than an individual, any individual patient can be classified as alive or dead using the survivorship of the population to predict that of the individual. In other words, if the survival of a Dukes C population is 55% at 36 months of follow up, the Dukes C individual patient would be classified as alive at 36 months but with only a 55% accuracy rate. By making these assumptions, the accuracy of a staging by a microarray classifier of the subject invention to that of a clinical staging system can be compared.


Identification of Prognosis-Related Genes


As a first step in the survival analysis of microarray data, genes that best separate cancer patients with poor and good prognosis were identified. Censored-survival analysis using significance analysis of microarrays (SAM) or any other microarray analysis (i.e., clustering methods such as those disclosed by Eisen et al., “Cluster analysis and display of genome-wide expression patterns,” Proc. Natl. Acad. Sci. USA, 95:14863-14868 (1998); Alon et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 96:6745-6750 (1999); and Ben-Dor et al., “Tissue classification with gene expression profiles,” J. Comput. Biol., 7:559-583 (2000); classification trees such those disclosed by Dubitzky et al., “A database system for comparative genomic hybridization analysis,” IEEE Eng Med Biol Mag, 20(4):75-83 (2001); genetic algorithms such as those disclosed by L1 et al., “Computational analysis of leukemia microarray expression data using the GA/KNN,” in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); neural networks such as those disclosed by Hwang et al., “Applying machine learning techniques to analysis of gene expression data: cancer diagnosis,” in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); and the “Neighborhood Analysis” (a weighted correlation method) as disclosed by Golub et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, 286:531-537 (1999)) can be used to select genes correlated with prognosis in accordance with the subject invention.


Using SAM or any other microarray analysis, genes can be selected that most closely correlate with selected survival times. Permutation analysis can then used to estimate the false discovery rate (FDR). The resultant mean-centered gene expression vectors can then be clustered and visualized using known computer software (i.e., Cluster 3.0 and Java TreeView 1.03, both of which are provided by Hoon MJLd, et al., “Open Source Clustering Software,” Bioinformatics 2003, in press).


Classifier Construction and Evaluation


According to the present invention, a gene classifier can be constructed to predict a set time of outcome among a set number of patients using microarray data produced on a cDNA platform. In one embodiment, the classifier of the subject invention is produced on a computing means that using SAM two-class gene selection and a support vector machine classification. In one embodiment, the SAM procedure is empirically set to select enough genes to satisfy a set FDR. Such selected genes can then be used in a linear support vector machine to classify the samples as having poor or good prognosis.


Leave-one-out cross-validation (LOOCV) operation can also be utilized to construct a classifier (i.e., neural network-based classifier) as well as to estimate the prediction accuracy of the classifier of the subject invention. In one embodiment, the classification process includes both gene selection and SVM classification creation; therefore, both steps can be performed on each training set after the test example is removed. According to the subject invention, samples can be classified as having “good” or “poor” prognosis based on survival for a certain set amount of time. In a preferred embodiment, “good” or “poor” prognosis is based on more or less than 36 months, respectively.


By using the leave-one-out cross validation approach, the subject invention provides a means for ranking the genes selected. The number of times a particular gene is chosen can be an indicator of the usefulness of that gene for general classification and may imply biological significance.


In a preferred embodiment, the classifier of the subject invention is prepared by (1) SAM gene selection using a t-test and (2) classification using a neural network. The classifier is prepared after a test sample is left out (from the LOOCV) to avoid bias from the gene selection step. Since the classification problem is a binary decision, a t-test was used for gene selection.


Preferably, once a gene set is selected, a feed-forward back-propogation neural network system (see Rumelhart, D. E. and J. L. McClelland, “Parallel Distributed Processing: Exploration in the Microstructure of Cognition,” Cambridge, Mass.: MIT Press (1986); and Fahlman, S. E., “Faster-Learning Variations on Back-Propogation: An Empirical Study,” Proceedings of the 1988 Connectionist Models Summer School, Los Altos, Calif.: Morgan-Kaufmann (1988)) is used. In one embodiment, a feed-forward back-propogation neural network with a single layer of 10 units is used. Neural network systems are extremely robust to both the number of genes selected and the level of noise in these genes.


Statistical Significance


Differences between Kaplan-Meier curves can be evaluated using the log-rank test, which is well known to the skilled statistician. This can be performed both for the initial survival analysis and for the classifier results. In accordance with the present invention, the classifier can split the samples into various groups (i.e., two groups: those predicted as good or poor prognosis). Classifier accuracy can be reported to the user both as overall accuracy and as specificity/sensitivity. In one embodiment, a McNemar's Chi-Squared test is used to compare the molecular classifier with the use of a Dukes' staging classifier. In a related embodiment, several permutations of the dataset (i.e., 1,000 permutations) are used to measure the significance of the classifier results as compared to chance.


EXAMPLE 1
Human Colon Cancer Survival Classifier

Training Set Tumor Samples


In one embodiment of the subject invention, a colon cancer survival classifier was developed using 78 tumor samples, including 3 adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank (Tampa, Fla.) based on evidence for good (survival >36 mo) or poor prognosis (survival <36 mo) from the Tumor Registry. Dukes' stages can include B, C, and D. In this particular embodiment, survival was measured as last contact minus collection date for living patients, or date of death minus collection date for patients who have died.


In this embodiment, the number of samples per Dukes' stage was as follows: 23 patients with stage B, 22 patients with stage C and 30 patients with stage D disease. Just as adenomas can be included to help train the classifier to recognize good prognosis patients, Dukes D patients with synchronous metastatic disease can be used to train the classifier to recognize poor prognosis patients.


In a related embodiment, all samples were selected to have at least 36 months of follow-up. The follow-up results in this embodiment showed that thirty-two of the patients survived more than 36 months, while 46 patients died within 36 months. With this particular embodiment, the median follow-up time for all 78 patients was 27.9 months. The median follow-up for the poor prognosis cases (<36 months survival) was 11.7 months and for the good prognosis cases (>36 months survival) it was 64.2 months.


Since the NIH consensus conference in 1990, chemotherapeutic application in the United States has been relatively homogeneous, with nearly all Dukes stage B avoiding chemotherapy, and nearly all Dukes stage C receiving 6 months of adjuvant 5-fluorouracil (5-FU) and leucovorin.


Test Set Tumor Samples (Denmark)


In another embodiment, eighty-eight patients with Dukes' stage B and C colorectal cancer and a minimum follow-up time of 60 months were selected for array hybridization. Ten micrograms of total RNA were used as starting material for the cDNA preparation and hybridized to Affymetrix U133A GeneChips (Santa Clara, Calif.) by standard protocols supplied by the manufacturer. The U133A gene chip is disclosed in U.S. Pat. Nos. 5,445,934; 5,700,637; 5,744,305; 5,945,334; 6,054,270; 6,140,044; 6,261,776; 6,291,183; 6,346,413; 6,399,365; 6,420,169; 6,551,817; 6,610,482; and 6,733,977; and in European Patent Nos. 619,321 and 373,203, all of which are hereby incorporated in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.


With this particular embodiment, there were 28 patients with stage B and 60 patients with stage C colorectal cancers. All Dukes' stage B patients were treated by surgical resection alone whereas all C patients received 5-FU/leucovorin adjuvant chemotherapy in addition to surgery. Colorectal tumor samples were obtained fresh from surgery and were immediately snap-frozen in fluid nitrogen but were not microdissected, with the potential for inclusion of samples with <80% purity. Total RNA was isolated from 50-150 mg tumor sample using RNAzol (WAK-Chemie Medical) or using spin column technology (Sigma) according to the manufacturer's instructions. Results were noted (i.e., fifty-seven of the patients survived more than 36 months, while 31 died within 36 months).


32K cDNA Array Hybridization and Scanning


According to the subject invention, samples can be microdissected (>80% tumor cells) by frozen section guidance and RNA extraction performed using Trizol followed by secondary purification on RNAEasy columns. The samples can then be profiled on cDNA arrays (i.e., TIGR's 32,488-element spotted cDNA arrays, containing 31,872 human cDNAs representing 30,849 distinct transcripts—23,936 unique TIGR TCs and 6,913 ESTs, 10 exogenous controls printed 36 times, and 4 negative controls printed 36-72 times).


In one embodiment, tumor samples are co-hybridized with a common reference pool in the Cy5 channel for normalization purposes. cDNA synthesis, aminoallyl labeling and hybridizations can be performed according to previously published protocols (see Hegde, P. et al., “A concise guide to cDNA microarray analysis,” Biotechniques; 29:552-562 (2000) and Yang, I. V, et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biol; 3:research0062 (2002)). For example, labeled first-strand cDNA is prepared, and co-hybridized with labeled samples are prepared, from a universal reference RNA consisting of equimolar quantities of total RNA derived from three cell lines, CaCO2 (colon), KM12L4A (colon), and U118MG (brain). Detailed protocols and description of the array are available at <http://cancer.tigr.org>. Array probes are identified and local background can be subtracted in Spotfinder (Saeed, A. I. et al., “TM4: a free, open-source system for microarray data management and analysis,” Biotechniques; 34:374-8 (2003)). Individual arrays can be normalized in MIDAS (see Saeed, A.I. ibid.) using LOWESS (an algorithm known to the skilled artisan for use in normalizing data) with smoothing parameter set to 0.33.


Microarray Hybridization and Scanning of Denmark Samples


The first and second strand cDNA synthesis can be performed using the SuperScript II System (Invitrogen) according to the manufacturer's instructions except using an oligodT primer containing a T7 RNA polymerase promoter site. Labeled cRNA is prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). Biotin labeled CTP and UTP (Enzo) are used in the reaction together with unlabeled NTP's. Following the IVT reaction, the unincorporated nucleotides are removed using RNeasy columns (Qiagen). Fifteen micrograms of cRNA are fragmented at 940 C for 35 min in a fragmentation buffer containing 40 mM Tris-acetate pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, the fragmented cRNA in a 6×SSPE-T hybridization buffer (1 M NaCl, 10 mM Tris pH 7.6, 0.005% Triton) is heated to 95° C. for 5 min and subsequently to 45° C. for 5 min before loading onto the Affymetrix HG_U133A probe array cartridge. The probe array is then incubated for 16 h at 45° C. at constant rotation (60 rpm). The washing and staining procedure can be performed in an Affymetrix Fluidics Station.


The probe array can be exposed to several washes (i.e., 10 washes in 6×SSPE-T at 25° C. followed by 4 washes in 0.5×SSPE-T at 50° C.). The biotinylated cRNA can then be stained with a streptavidinphycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6×SSPE-T for 30 min at 25° C. followed by 10 washes in 6×SSPE-T at 25° C. An antibody amplification step can then follow, using normal goat IgG as blocking reagent, final concentration 0.1 mg/ml (Sigma) and biotinylated anti-streptavidin antibody (goat), final concentration 3 mg/ml (Vector Laboratories). This can be followed by a staining step with a streptavidin-phycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6×SSPE-T for 30 min at 25° C. and 10 washes in 6×SSPE-T at 25° C. The probe arrays are scanned (i.e., at 560 nm using a confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner G2500A)). The readings from the quantitative scanning can then be analyzed by the Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized to a common mean expression value of 150.


Survival Analysis


The first analysis of the colon cancer survival data can be performed using censored survival time (in months) and 500 permutations. Significance analysis of microarrays (SAM) can then be used to select genes most closely correlated to survival. The subset of genes that correspond to an empirically derived, estimated false discovery rate (FDR) is then chosen. This subset of genes can then be used in subsequent analyses. In one embodiment, Cluster 3.0 and Java TreeView 1.03 are used to cluster and visualize the SAM-selected genes.


A hierarchical clustering algorithm can be chosen, with complete linkage and the correlation coefficient (i.e., Pearson correlation coefficient) as the similarity metric. In another embodiment, the Dukes' staging clusters are manually created in the appropriate format. Clustering software produces heatmap (see FIGS. 1A and 1B) and dendrograms. The highest level partition of the SAM-selected genes can then be chosen as a survival grouping. Given two clusters of survival times, Kaplan-Meier curves can be plotted (see FIGS. 2A and 2B).


Identification of Prognosis-Related Genes


According to the subject invention, SAM survival analysis can be used to identify a set of genes most correlated with censored survival time using the training set tumor samples. In one embodiment, a set of 53 genes was found, corresponding to a median expected false discovery rate (FDR) of 28%. These genes are listed in the following Table 1, wherein genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases). Included in this list of genes in Table 1 are several genes believed to be biologically significant, such as osteopontin and neuregulin.

TABLE 1Censored survival analysis using SAM, resultant 53 genes selected with median28% FDRUniGeneGeneBank IDIDDescriptionN36176Hs.108636membrane protein CH1AA149253Hs.107987N/AAA425320Hs.250461hypothetical protein; MDG1; similar to putative microvascularendothelial differentiation gene 1; similar to X98993 (PID: g1771560)AA775616Hs.313OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin, bonesialoprotein I, early T-lymphocyte activation 1)N72847Hs.125221Alu subfamily SP sequence contamination warning entry. [Human]{Homo sapiens}AA706226Hs.113264neuregulin 2 isoform 4AA976642Hs.42116axin 2 (conductin, axil)AA133215Hs.32989Receptor activity-modifying protein 1 precursor (CRLR activity-modifyingprotein 1)AA457267Hs.70669P19 protein; HMP19 proteinN50073Hs.84926hypothetical proteinR38360Hs.145567Unknown {Homo sapients}AA450205Hs.8146translocation protein-1; Sec62; Dtrp1 protein; membrane proteinSEC62, S. cerevisiae, homolog of [Homo sapiens];AA148578Hs.110956KOX 13 protein (56 AA)R38640Hs.89584insulinoma-associated 1; bA470C13.2 (insulinoma-associated protein 1)AA487274Hs.48950heptacellular carcinoma novel gene-3 protein; DAPPER1N53172Hs.23016orphan receptor; orphan G protein-coupled receptor RDC1AA045308Hs.7089insulin induced protein 2; INSIG-2 membrane proteinAA045075Hs.62751syntaxin 7N63366Hs.161488N/AR22340nullchr2 synaptotagmin; KIAA1228 proteinAA437223Hs.46640Adult retina proteinAA481250Hs.154138chitinase precursor; chitinase 3-like 2; chondrocyte protein 39AA045793Hs.6790hypothetical protein; MDG1; similar to putative microvascularendothelial differentiation gene 1; similar to X98993 (PID: g1771560);microvascular endothelial differentiation gene 1 product; microvascularendothelial differentiation gene 1; DKFZP564F1862 pH87795Hs.233502N/AAA121806Hs.84564Rab3c; hypothetical protein BC013033AA284172Hs.89385NPAT; predicted amino acids have three regions which share similarityto annotated domains of transcriptional factor oct-1, nucleolus-cytoplasm shuttle phosphoprotein and protein kinases; NPAT; nuclearprotein, ataxiatelangiectasia locus; Similar to nucR68106Hs.233450Fc-gamma-RIIb2; precursor polypeptide (AA −42 to 249); IgG Fcreceptor; IgG Fc receptor; IgG Fc receptor beta-Fc-gamma-RII; IgG Fcfragment receptor precursor; Fc gamma RIIB [Homo sapiens]; Fcgamma RIIB [HoAA479270Hs.250802Diff33 protein homolog; KIAA1253 protein [Homo sapiens];KIAA1253protein [Homo sapiens]AA432030Hs.179972Interferon-induced protein 6-16 precursor (Ifi-6-16). [Human] {Homosapiens}R10545Hs.148877dJ425C14.2 (Placental proteinAA453508Hs.168075transportin; karyopherin (importin) beta 2 [Homo sapiens]; karyopherinbeta 2; importin beta 2; transportin; M9 region interaction protein [Homosapiens]AI149393Hs.9302phosducin-like protein; phosducin-like protein; phosducin-like protein;phosducin-like protein; hypothetical protein; phosducin-like; Unknown(proteinfor MGC: 14088) [Homo sapiens]AA883496Hs.125778NullAA167823Hs.112058CD27BP {Homo sapiens}AI203139Hs.180370hypothetical protein FLJ30934 [Homo sapiens]+H19822Hs.2450KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo sapiens];leucyltRNA synthetase, mitochondrial [Homo sapiens]; leucine-tRNAligase precursor; leucine translase [Homo sapiens]+W73732Hs.83634Null+AA777892Hs.121939Null+AA885478Hs.125741unnamed protein product [Homo sapiens]; hypothetical proteinFLJ12505 [Homo sapiens]; Unknown (protein for MGC: 39884) [Homosapiens]+AA932696Hs.8022TU3A protein; TU3A protein [Homo sapiens]+AA481507Hs.159492unnamed protein product [Homo sapiens]+H18953Hs.15232Null+AA709158Hs.42853put. DNA binding protein; put. DNA binding protein; cAMP responsiveelement binding protein-like 1; Creb-related protein [Homo sapiens]+AA488652Hs.4209HSPC235; ribosomal protein L2; Similar to ribosomal protein,mitochondrial, L2 [Homo sapiens]; mitochondrial ribosomal proteinL37; ribosomal protein, mitochondrial, L2 [Homo sapiens]+N39584Hs.17404Null+H62801Hs.125059Unknown (protein for IMAGE: 4309224) [Homo sapiens]; hypotheticalprotein [Homo sapiens]+H17638Hs.17930dJ1033B10.2.2 (chromosome 6 open reading frame 11 (BING4),isoform 2) [Homo sapiens]+R43684Hs.165575dJ402G11.5 (novel protein similar to yeast and bacterial predictedproteins) {Homo sapiens}+N21630Hs.143039hypothetical protein PRO1942+T81317Hs.189846Alu subfamily J sequence contamination warning entry. [Human]{Homosapiens}+R45595Hs.23892Null+T90789Hs.121586ray; small GTP binding protein RAB35 [Homo sapiens]; RAB35,member RAS oncogene family,; ras-related protein rab-1c (GTP-bindingprotein ray) [Homosapiens]+AA283062Hs.73986Similar to CDC-like kinase 2 {Homo sapiens}
Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 1 are hereby incorporated by reference.



FIG. 1A presents a graphical representation of the 53 SAM-selected genes (as described above) as a clustered heat map. The red color represents over-expressed genes relative to green, under-expressed genes. FIG. 1A shows only the Dukes' stage B and C cases, whose outcome Dukes' staging predicts poorly. Since only genes correlated with survival are used in clustering, the distinctly illustrated clusters in the heatmap correspond to very different prognosis groups.


The 53 SAM-selected genes were also arranged by annotated Dukes' stage in FIG. 1B. Unlike FIG. 1A, where two gene groups were apparent, there was no discernible gene expression grouping when arranged by Dukes' stage.



FIG. 2A shows the Kaplan-Meier plot for two dominant clusters of genes correlated with stage B and C test set tumor samples. Clearly, these genes separated the cases into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1) (P<0.001 using a log rank test). FIG. 2B presents a Kaplan-Meier plot of the survival times of Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference.


As illustrated in FIGS. 1A, 1B, 2A, and 2B, gene expression profiles separate good and poor prognosis cases better than Dukes' staging. This suggests that a gene-expression based classifier, as provided by the present invention, is more accurate at predicting patient prognosis than the traditional Dukes' staging.


Dukes' Staging as a Prognosis Classifier


As noted above, Dukes' staging provides only a probability of survival for each member of a population of patients, based on historical statistics. Accordingly, the prognosis of an individual patient can be predicted based on historical outcome probabilities of the associated Dukes' stage. For example, if a Dukes' C. survival rate was 55% at 36 months of follow up, any individual Dukes' C. patient would be classified as having a good prognosis since more than 50% of patients would be predicted to be alive.


Performance of a Colorectal Cancer Survival Classifier of the Present Invention as Compared to Dukes' Staging


In order to determine the value of the human colon cancer prognosis/survival classifier of the subject invention, a classifier of the invention was compared to the Dukes' clinical staging approach currently in widespread use. In an initial set of 78 tumors (from the test set tumor samples described above), a classifier (Classifier A) of the present invention predicted 100%, 69%, 55% and 20% for Adenomas, and Dukes' stages B, C and D cancers, respectively. The overall accuracy was 77% (63% sensitivity/97% specificity).


Using LOOCV, Classifier A was evaluated in predicting prognosis for each patient at 36 months follow-up as compared to Dukes' staging predictions. The results of LOOCV demonstrated that Classifier A of the subject invention was 90% accurate (93% sensitivity/84% specificity) in predicting the correct prognosis for each patient at 36 month of follow-up. A log-rank test of the two predicted groups (good and poor prognosis) was significant (P<0.001), demonstrating the ability of Classifier A to distinguish the two outcomes (FIG. 2A). Permutation analysis demonstrates the result is better than possible by chance (P<0.001-1000 permutations).


This result is also significantly higher than that observed using Dukes' staging as a classifier (77%) for the same group of patients (P=0.03878). The results for both Dukes' staging and molecular staging are summarized in Tables 2A-2C below. Shown first in Table 2A are the relative accuracies of Dukes' staging and the cDNA classifier (molecular staging) for all tumors and then a comparison by Dukes' stage. As shown in Table 2B, Dukes' staging was particularly bad at predicting outcome for patients with poor prognosis (70% and 55% for all stages and B and C, respectively). In contrast, molecular staging, as provided by the present invention, identified the good prognosis cases (the “default” classification using Dukes' staging), but also identified poor prognosis cases with a high degree of accuracy, Table 2C. Tables 2A-2C also show the detailed confusion matrix for all samples in the dataset, showing the equivalent misclassification rate of both good and poor prognosis groups by the classifier of the subject invention.

TABLE 2ALOOCV Accuracy of Dukes' vs. Molecular Staging for alltumors.Classification MethodTotal AccuracySensitivitySpecificityDukes' Staging77%63%97%Molecular Staging*90% 93%84%









TABLE 2B










Comparison of Molecular Staging and Dukes' Staging


Accuracy.











Dukes' Stage
Molecular Staging
Dukes' Staging















Adenoma
100%
100%



B
87%
70%



C
91%
55%



D
90%
97%

















TABLE 2C










Confusion Matrix of cDNA Classifier Results.












Observed/Predicted
Poor
Good
Totals
















Poor
43
3
46



Good
5
27
32



Total
48
30
78









*Dukes' staging vs. cDNA Classifier, P = 0.03878, one-sided McNemar's test.








Classifier Construction


Leave-one-out cross-validation technique can be utilized for evaluating the performance of a classifier construction method of the subject invention. This approach tends towards high variance in accuracy estimates, but with low bias.


Within each step of the leave-one-out cross-validation (or fold), a classifier of the subject invention can be created on all available training data, then tested for accuracy by classifying the left-out example. In one embodiment, a classifier was constructed in two steps: first a gene selection procedure was performed with SAM and then a support vector machine was constructed.


In a related embodiment, the gene selection approach used was a univariate selection. SAM (significance analysis of microarrays) was the method chosen for selecting genes. Since gene selected was to be based on two classes (good vs. poor prognosis), the two-class SAM method can be used for selecting genes with the best d values. SAM calculates false discovery rates empirically through the use of permutation analysis. SAM provides an estimate of the false discovery rate (FDR) along with a list of genes considered significant relative to censored survival. This feature of SAM was used with this particular embodiment to select the number of genes that resulted in the smallest FDR possible. In one embodiment, this FDR was zero.


The set of 53 genes (significant genes, as described above) at a FDR of 28% was used in this particular embodiment. Using this subset of 53 genes, the samples were clustered as a way of visualizing the SAM results (see FIGS. 1A and 1B). Once the genes were selected using the SAM method, a linear support vector machine (SVM) was constructed. The software used for this approach can be implemented in a weka machine learning toolkit. A linear SVM was then chosen to reduce the potential for overfitting the data, given the small sample sizes and large dimensionality. One further advantage of this approach is the transparency of the constructed model, which is of particular interest when comparing the classifier of the subject invention on two different platforms (see below).


In another embodiment, using LOOCV via statistical analytic tools for comparing groups (i.e., parametric tests such as t-test/ANOVA; see also Dyrskjot L et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat. Genet., 33:90-6 (2003)), a list of 43 genes (from the 53 SAM selected genes as described above) was selected for use in constructing a second human colorectal cancer survival classifier, in accordance with the present invention. The list of 43 genes is provided in the following Table 3.

TABLE 3Genes used in the cDNA classifier (selected by t-test) and ranked by selectionfrequency using LOOCV.NumberTimesGeneBankUniGeneOccurredIDIDDescriptionM*78AA045075Hs.62751syntaxin 7M*78AA425320Hs.250461hypothetical protein; MDG1; similar to putativemicrovascular endothelial differentiation gene 1; similar toX98993 (PID: g1771560);microvascular endothelial differentiation gene 1 product;microvascularendothelial differentiation gene 1;DKFZP564F1862 pM78AA437223Hs.46640adult retina proteinM*78AA479270Hs.250802Diff33 protein homolog; KIAA1253 proteinM*78AA486233Hs.2707G1 to S phase transition 1M*78AA487274Hs.48950heptacellular carcinoma novel gene-3 protein; DAPPER1M78AA488652Hs.4209HSPC235; ribosomal protein L2; Similar to ribosomalprotein, mitochondrial, L2 [Homo sapiens]; mitochondrialribosomal protein L37; ribosomal protein, mitochondrial, L2[Homo sapiens]M78AA694500Hs.116328hypothetical protein MGC33414; Similar to PR domaincontaining 1, with ZNF domainM78AA704270Hs.189002NullM*78AA706226Hs.113264neuregulin 2 isoform 4M*78AA709158Hs.42853put. DNA binding protein; put. DNA binding protein; cAMPresponsive element binding protein-like 1; Creb-relatedproteinM*78AA775616Hs.313OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin,bone sialoprotein I, early T-lymphocyte activation 1)M78AA777892Hs.121939NullM*78AA873159Hs.182778apolipoprotein CI; apolipoprotein C-I variant II;apolipoprotein C-I variant IM*78AA969508Hs.10225HEYL protein; hairy-related transcription factor 3;hairy/enhancer-ofsplit related with YRPW motif-likeM78AI203139Hs.180370hypothetical protein FLJ30934M*78AI299969Hs.255798unnamed protein product; HN1 like; Unknown (protein forMGC: 22947)M*78H17364Hs.80285CRE-BP1 family member; cyclic AMP response elementDNA-binding protein isoform 1 family; cAMP responseelement binding protein (AA1-505); cyclic AMP responseelement-binding protein (HB16); Similar to activatingtranscription factor 2 [Homo sapiens]; actM78H17627Hs.83869unnamed proteinM*78H19822Hs.2450KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homosapiens]; leucyl-tRNA synthetase, mitochondrial [Homosapiens]; leucine-tRNA ligase precursor; leucine translase[Homo sapiens]M*78H23551Hs.30974NADH dehydrogenase subunit 4 {Deirochelys reticularia}M78H62801Hs.125059Unknown (protein for IMAGE: 4309224) [Homo sapiens];hypothetical protein [Homo sapiens]M78H85015Hs.138614nullM78N21630Hs.143039hypothetical protein PRO1942M*78N36176Hs.108636membrane protein CH1; membrane protein CH1 [Homosapiens]; membrane protein CH1 [Homo sapiens]; membraneprotein CH1 [Homo sapiens]M*78N72847Hs.125221Alu subfamily SP sequence contamination warning entry.[Human] {Homo sapiens}M78N92519Hs.1189Unknown (protein for MGC: 10231) [Homo sapiens]M*78R27767Hs.79946thyroid hormone receptor-associated protein, 150 kDasubunit; Similar to thyroid hormone receptor-associatedprotein, 150 kDa subunit [Homo sapiens];;M*78R34578Hs.111314nullM78R38360Hs.145567unknown {Homo sapiens}M78R43597Hs.137149trehalase homolog T19F6.30 - Arabidopsis thalianaM78R43684Hs.165575dJ402G11.5 (novel protein similar to yeast and bacterialpredicted proteins)M*78W73732Hs.83634NullM*77AA450205Hs.8146translocation protein-1; Sec62; translocation protein 1; Dtrp1protein; membrane protein SEC62, S. cerevisiae, homolog of[Homo sapiens];M77AI081269Hs.184108Alu subfamily SX sequence contamination warning entry.M*77R59314Hs.170056nullM*72AA702174Hs.75263pRb-interacting protein RbBP-36M*70AI002566Hs.81234immunoglobin superfamily, member 3M*63AA676797Hs.1973cyclin FM*62AA453508Hs.168075transportin; karyopherin (importin) beta 2; M9 regioninteraction proteinM62W93980Hs.59511nullM*58AA045308Hs.7089insulin induced protein 2; INSIG-2 membrane proteinM58AA953396Hs.127557nullM52AA962236Hs.124005hypothetical protein MGC19780M*50AA418726Hs.4764nullM50R43713Hs.22945nullM*41AA664240Hs.8454artifact-warning sequence (translated ALU class C) - humanM*38AA477404Hs.125262hypothetical protein; unnamed protein product; GL003;AAAS protein; adracalin; aladinM*37AA826237Hs.3426Era GTPase A protein; conserved ERA-like GTPase [Homosapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;GTPase, human homolog of E. coli essential cell cycleprotein Era; era (E. coli Gprotein homolog)-like 1 [Homosapiens]M*30AA007421Hs.113992candidate tumor suppressor protein {Homo sapiens}M*30AA478952Hs.91753unnamed protein product; hypothetical protein [Homosapiens]; unnamed protein product [Homo sapiens];hypothetical protein [Homo sapiens]M62W93980Hs.59511NullM*58AA045308Hs.7089insulin induced protein 2; INSIG-2 membrane proteinM58AA953396Hs.127557null52AA962236Hs.124005hypothetical protein MGC19780*50AA418726Hs.4764null50R43713Hs.22945null*41AA664240Hs.8454artifact-warning sequence (translated ALU class C) - human*38AA477404Hs.125262hypothetical protein; unnamed protein product; GL003;AAAS protein; adracalin; aladin*37AA826237Hs.3426Era GTPase A protein; conserved ERA-like GTPase [Homosapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;GTPase, human homolog of E. coli essential cell cycleprotein Era; era (E. coli Gprotein homolog)-like 1 [Homosapiens]*30AA007421Hs.113992candidate tumor suppressor protein {Homo sapiens}*30AA478952Hs.91753unnamed protein product; hypothetical protein [Homosapiens]; unnamed protein product [Homo sapiens];hypothetical protein [Homo sapiens]30AA885096Hs.43948Alu subfamily SQ sequence contamination warning entry.28H29032Hs.7094null*24R10545Hs.148877dJ425C14.2 (Placental protein*22AA448641Hs.108371transcription factor; E2F transcription factor 4; p107/p130-binding protein20R38266Hs.12431Unknown (protein for MGC: 30132)19H17543Hs.92580Alu subfamily J sequence contamination warning entry.11T81317Hs.189846Alu subfamily J sequence contamination warning entry.*9AA453790Hs.255585null9R22340nullunnamed protein product; chr2 synaptotagmin KIAA1228protein7AA987675Hs.176759null7N51543Hs.47292null*7N74527Hs.5420unnamed protein product*6AA121778Hs.95685null*6AA258031Hs.125104unnamed protein product; MUS81 endonuclease*6AA702422Hs.66521josephin MJD1; super cysteine rich protein; SCRP6T64924Hs.220619null*5R42984Hs.4863null*5R59360Hs.12533null*5R63816Hs.28445unnamed protein product5T49061Hs.8934HA-70 {Clostridium botulinum}4AA016210Hs.24920null4AA682585Hs.193822null4AA705040Hs.119646Alu subfamily J sequence contamination warning entry.[Human] {Homo sapiens}4AA909959Hs.130719NESH; hypothetical protein; NESH protein [Homo sapiens];NESH protein; new molecule including SH3 [Homo sapiens]4AI240881Hs.89688complement receptor type 1-like protein {Homo sapiens}*3AA133215Hs.32989Receptor activity-modifying protein 1 precursor (CRLRactivity-modifying-protein 1)3AA699408Hs.168103prp28, U5 snRNP 100 kd protein; prp28, U5 snRNP 100 kdprotein [Homo sapiens]3AA910771Hs.130421null*3AI362799Hs.110757hypothetical protein; NNP3 [Homo sapiens]*3H51549Hs.21899UDP-galactose translocator; UDP-galactose transporter 1[Homo sapiens]3R06568Hs.187556null2AA001604Hs.204840null*2AA132065Hs.109144unknown; SMAP-5; Similar to hypothetical proteinAF140225*2AA490493Hs.24340null2AA633845Hs.192156null*2AI261561Hs.182577Alu subfamily SQ sequence contamination warning entry.*2H81024Hs.180655Aik2; aurora-related kinase 2; serine/threonine kinase 12;Unknown (protein for MGC: 11031) [Homo sapiens];Unknown (protein for MGC: 4243) [Homo sapiens]2N75004Hs.49265hypothetical protein {Plasmodium falciparum 3D7}2W96216Hs.110196NICE-1 protein1AA045793Hs.6790hypothetical protein; MDG1; similar to putative microvascularendothelial differentiation gene 1; similar to X98993(PID: g1771560); microvascular endothelial differentiation gene 1product; microvascular endothelial differentiation gene 1;DKFZP564F1862 p*1AA284172Hs.89385NPAT; predicted amino acids have three regions which sharesimilarity to annotated domains of transcriptional factor oct-1, nucleoluscytoplasm shuttle phosphoprotein and proteinkinases; NPAT; nuclear protein, ataxia-telangiectasia locus;Similar to nuc*1AA411324Hs.67878interleukin-13 receptor; interleukin-13 receptor; interleukin13 receptor, alpha 1 [Homo sapiens]; Similar to interleukin 13receptor, alpha 1[Homo sapiens]; bB128O4.2.1 (interleukin13 receptor, alpha 1) [Homosapiens]; interleukin 13 receptor, alpha 1*1AA448261Hs.139800high mobility group AT-hook 1 isoform b; nonhistonechromosomal high-mobility group protein HMG-I/HMG-Y[Homo sapiens]*1AA479952Hs.154145Alu subfamily SX sequence contamination warning entry.[Human] {Homo sapiens}*1AA485752Hs.9573ATP-binding cassette, sub-family F, member 1; ATP-bindingcassette 50; ATP-binding cassette, sub-family F (GCN20),member 1 [Homo sapiens];;*1AA504266Hs.8217nuclear protein SA-2; bA517O1.1 (similar to SA2 nuclearprotein); hypothetical protein [Homo sapiens]; stromalantigen 2 [Homo sapiens]*1AA630376Hs.8121null*1AA634261Hs.25035null1AA701167Hs.191919Alu subfamily SB sequence contamination warning entry.[Human] {Homo sapiens}*1AA703019Hs.114159small GTP-binding protein; RAB-8b protein; Unknown(protein for MGC: 22321) [Homo sapiens]*1AA706041Hs.170253unnamed protein product [Homo sapiens]; hypotheticalprotein FLJ23282 [Homo sapiens];;1AA773139Hs.66103null1AA776813Hs.191987hypothetical protein {Macaca fascicularis}*1AA862465Hs.71zinc-alpha2-glycoprotein precursor; Zn-alpha2-glycoprotein;Znalpha2-glycoprotein; alpha-2-glycoprotein 1, zinc; alpha-2-glycoprotein 1, zinc [Homo sapiens];;*1AA977711Hs.128859null1AI288845Hs.105938putative chemokine receptor; putative chemokine receptor;chemokine receptor X; C—C chemokine receptor 6. (CCR6)(Evidence is not experimental); chemokine (C—C motif)receptor-like 2 [Homo sapiens]*1H15267Hs.210863null1H18956Hs.21035unnamed protein product [Homo sapiens]1H73608Hs.94903null*1H99544Hs.153445unknown; endothelial and smooth muscle cell-derivedneuropilin-like protein [Homo sapiens]; endothelial andsmooth muscle cell-derived neuropilin-like protein;coagulation factor V/VIII-homology domains protein 1[Homo sapiens]*1N45282Hs.201591calcitonin receptor-like*1N48270Hs.45114Similar to golgi autoantigen, golgin subfamily a, member 6[Homo sapiens]1N59451Hs.48389null*1N95226Hs.22039KIAA0758 protein;1R37028Hs.20956cytochrome bd-type quinol oxidase subunit I related protein{Thermoplasma acidophilum}1R66605Hs.182485Unknown (protein for IMAGE: 4843317) {Homo sapiens}*1T51004Hs.167847null1T51316nullnull1T72535Hs.189825null*1W72103Hs.236443beta-spectrin 2 isoform 2
Mdenotes genes that were used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and the U133A-limited cDNA classifier are marked by *.


Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 3 are hereby incorporated by reference.


In yet another embodiment, a third human colorectal cancer survival classifier, in accordance with the present invention, was prepared using U133A-limited genes selected by LOOCV via statistical analytic tools (i.e., t-test). The list of U133A-limited genes selected using LOOCV via t-test is provided in the following Table 4. The named genes common to both the original classifier (a set of 43 genes) and the U133A-limited classifier are marked with an asterisk. Table 5 illustrates seven genes selected by SAM survival analysis, where osteopontin and neuregulin are noted to be present and in common with the gene lists for all classifiers. In Table 5, genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases)

TABLE 4Genes used in U133A-limited cDNA classifier (selected by t-test) and rankedby selection frequency using LOOCV.NumberTimesGeneBankUniGeneOccurredIDIDDescriptionM*78AA007421Hs.113992candidate tumor suppressor proteinM*78AA045075Hs.62751syntaxin 7M*78AA045308Hs.7089insulin induced protein 2, INSIG-2 membrane proteinM*78AA418726Hs.4764nullM*78AA425320Hs.250461hypothetical protein; MDG1; similar to putativemicrovascular endothelial differentiation gene 1; similar toX98993 (PID: g1771560); microvascular endothelialdifferentiation gene 1 product; microvascular endothelialdifferentiation gene 1; DKFZP564F1862 pM*78AA450205Hs.8146translocation protein-1; Sec62; translocation protein 1; Dtrp1protein; membrane protein SEC62, S. cerevisiae, homolog of[Homo sapiens];M*78AA453508Hs.168075transportin; karyopherin (importin) beta 2; M9 regioninteraction proteinM*78AA453790Hs.255585nullM*78AA477404Hs.125262hypothetical protein; unnamed protein product; GL003;AAAS protein; adracalin; aladin; adracalinM*78AA478952Hs.91753unnamed protein productM*78AA479270Hs.250802Diff33 protein homolog; KIAA1253 proteinM*78AA486233Hs.2707G1 to S phase transition 1 [Homo sapiens]M*78AA487274Hs.48950heptacellular carcinoma novel gene-3 protein; DAPPER1[Homo sapiens]; unnamed protein product [Homo sapiens]M*78AA664240Hs.8454artifact-warning sequence (translated ALU class C) - humanM*78AA676797Hs.1973cyclin FM*78AA702174Hs.75263pRb-interacting protein RbBP-36M*78AA706226Hs.113264neuregulin 2 isoform 4M*78AA709158Hs.42853put. DNA binding protein; put. DNA binding protein; cAMPresponsive element binding protein-like 1; Creb-relatedprotein [Homo sapiens]M*78AA775616Hs.313OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin,bone sialoprotein I, early T-lymphocyte activation 1);secreted phosphoprotein 1 (osteopontin, bone sialoprotein I,early T-lymphocyte activation 1) [Homo sapiens]; secretedphosphoprotein 1 (ostM*78AA826237Hs.3426Era GTPase A protein; conserved ERA-like GTPase [Homosapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;GTPase, human homolog of E. coli essential cell cycleprotein Era; era (E. coli G-protein homolog)-like 1 [Homosapiens]M*78AA873159Hs.182778apolipoprotein CI; apolipoprotein CI; apolipoprotein C-I;apolipoprotein C-I precursor; apolipoprotein C-I variant II;apolipoprotein C-I variant I; Similar to apolipoprotein C-I[Homo sapiens]M*78AA969508Hs.10225HEYL protein; hairy-related transcription factor 3;hairy/enhancer-of-split related with YRPW motif-like [Homosapiens]M*78AI002566Hs.81234immunoglobin superfamily, member 3M*78AI299969Hs.255798unnamed protein product [Homo sapiens]; HN1 like [Homosapiens]; Unknown (protein for MGC: 22947) [Homosapiens]; HN1 like [Homo sapiens]M*78H17364Hs.80285CRE-BP1 family member; cyclic AMP response elementDNA-binding protein isoform 1 family; cAMP responseelement binding protein (AA 1-505); cyclic AMP responseelement-binding protein (HB16); Similar to activatingtranscription factor 2 [Homo sapiens]; actM*78H19822Hs.2450KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homosapiens]; leucyl-tRNA synthetase, mitochondrial [Homosapiens]; leucine-tRNA ligase precursor; leucine translase[Homo sapiens]M*78H23551Hs.30974NADH dehydrogenase subunit 4 {Deirochelys reticularia}M*78N36176Hs.108636membrane protein CH1; membrane protein CH1 [Homosapiens]; membrane protein CH1 [Homo sapiens]; membraneprotein CH1 [Homo sapiens]M*78N72847Hs.125221Alu subfamily SP sequence contamination warning entry.[Human] {Homo sapiens}M*78R10545Hs.148877dJ425C14.2 (Placental proteinM*78R27767Hs.79946thyroid hormone receptor-associated protein, 150 kDasubunit; Similar to thyroid hormone receptor-associatedprotein, 150 kDa subunit [Homo sapiens];;M*78R34578Hs.111314nullM*78R59314Hs.170056nullM*78W73732Hs.83634nullM*74AA448641Hs.108371transcription factor; E2F transcription factor 4; p107/p130-binding protein [Homo sapiens]; E2F transcription factor 4,p107/p130-binding [Homo sapiens]; E2F transcription factor4, p107/p130-binding [Homo sapiens];M*68R59360Hs.12533nullM*63AA121778Hs.95685nullM*59H51549Hs.21899UDP-galactose translocator; UDP-galactose transporter 1[Homo sapiens]*57H81024Hs.180655Aik2; aurora-related kinase 2; serine/threonine kinase 12;serine/threonine kinase 12 [Homo sapiens]; Unknown(protein for MGC: 11031) [Homo sapiens]; Unknown (proteinfor MGC: 4243) [Homo sapiens]*56AA490493Hs.243400*56R42984Hs.4863null*53AA258031Hs.125104unnamed protein product [Homo sapiens]; MUS81endonuclease [Homo sapiens]; MUS81 endonuclease [Homosapiens]*52AA133215Hs.32989Receptor activity-modifying protein 1 precursor (CRLRactivity-modifying-protein 1)*52R63816Hs.28445unnamed protein product [Homo sapiens]*51N95226Hs.22039KIAA0758 protein*45N74527Hs.5420unnamed protein product {Homo sapiens}*36AA702422Hs.66521josephin MJD1; super cysteine rich protein; SCRP*29AI261561Hs.182577Alu subfamily SQ sequence contamination warning entry.[Human] {Homo sapiens}*28AA132065Hs.109144unknown; SMAP-5; Similar to hypothetical proteinAF140225 [Homo sapiens]; Similar to hypothetical proteinAF140225 [Homo sapiens]; unnamed protein product [Homosapiens]; unknown [Homo sapiens]; hypothetical proteinAF140225 [Homo sapiens]*28AI362799Hs.110757hypothetical protein; NNP3 [Homo sapiens]*27AA045793Hs.6790hypothetical protein; MDG1; similar to putativemicrovascular endothelial differentiation gene 1; similar toX98993 (PID: g1771560); microvascular endothelialdifferentiation gene 1 product; microvascular endothelialdifferentiation gene 1; DKFZP564F1862 p*27AA284172Hs.89385NPAT; predicted amino acids have three regions which sharesimilarity to annotated domains of transcriptional factor oct-1, nucleolus-cytoplasm shuttle phosphoprotein and proteinkinases; NPAT; nuclear protein, ataxia-telangiectasia locus;Similar to nuc24N51632Hs.75353The KIAA0123 gene product is related to rat generalmitochondrial matrix processing protease (MPP).; Unknown(protein for IMAGE: 3632957) [Homo sapiens]; Unknown(protein for IMAGE: 3857242) [Homo sapiens]; inositolpolyphosphate-5-phosphatase, 72 kDa; KIAA023AA482110Hs.4900Unknown gene product; PRO0915; CUA001; hypotheticalprotein [Homo sapiens]; hypothetical protein [Homo sapiens]22AA485450Hs.132821flavin containing monooxygenase 2; flavin containingmonooxygenase 2 [Homo sapiens]*19AA699408Hs.168103prp28, U5 snRNP 100 kd protein; prp28, U5 snRNP 100 kdprotein [Homo sapiens]18N70777Hs.49927BA103J18.1.2 (novel protein, isoform 2) [Homo sapiens]16AA993736Hs.169838hypothetical protein; vesicle-associated membrane protein 4[Homo sapiens]; Similar to vesicle-associated membraneprotein 4 [Homo sapiens]15AI139498Hs.151899delta sarcoglycan; delta-sarcoglycan isoform 2; Sarcoglyan,delta (35 kD dystrophin-associated glycoprotein); dystrophinassociated glycoprotein, delta sarcoglycan; 35 kD dystrophin-associated glycoprotein [Homo sapiens]15N59721Hs.21858glia-derived nexin precursor; serine (or cysteine) proteinaseinhibitor, clade E (nexin, plasminogen activator inhibitor type1), member 2; protease inhibitor 7 (protease nexin I); glia-derived nexin [Homo sapiens]; similar to serine (or cysteine)protein14AA431885Hs.5591MAP kinase-interacting serine/threonine kinase 1; MAPkinaseinteracting kinase 1 [Homo sapiens]14AA911661Hs.2733Hox2H protein (AA 1-356); K8 homeo protein; HOX2.8 geneproduct; HOXB2 protein; HOX-2.8 protein (77 AA); homeobox B2; homeo box 2H; homeobox protein Hox-B2; K8home protein [Homo sapiens];13AA775865Hs.7579KIAA1192 protein; HSPC273; unnamed protein product;hypothetical protein FLJ10402 [Homo sapiens]; unnamedprotein product [Homo sapiens]; hypothetical proteinFLJ10402 [Homo sapiens]; hypothetical protein [Homosapiens]; unnamed protein product [Homo sapiens]13R30941Hs.24064signal transducer and activator of transcription Stat5B;transcription factorStat5b; STAT5B_CDS [Homo sapiens];signal transducer and activator of transcription 5B; signaltransducer and activator of transcription 5; transcriptionfactor STAT5B [Homo sapiens]*11AA703019Hs.114159small GTP-binding protein; RAB-8b protein; Unknown(protein for MGC: 22321) [Homo sapiens]11AA777192Hs.47062RNA Polymerase II subunit 14.5 kD; DNA directed RNApolymerase II polypeptide I; DNA directed RNA polymeraseII 14.5 kda polypeptide [Homo sapiens]; polymerase (RNA)II (DNA directed) polypeptide I (14.5 kD) [Homo sapiens]*10W72103Hs.236443beta-spectrin 2 isoform 2 [Homo sapiens]*9H15267Hs.210863null8H17638Hs.17930dJ1033B10.2.2 (chromosome 6 open reading frame 11BING4), isoform 2) [Homo sapiens]8R60193Hs.11637null7R92717Hs.170129choroideremia-like Rab escort protein 2; dJ317G22.3(choroideremia-like (Rab escort protein 2))*6AA706041Hs.170253unnamed protein product [Homo sapiens]; hypotheticalprotein FLJ23282 [Homo sapiens];;*5AA411324Hs.67878interleukin-13 receptor; interleukin-13 receptor; interleukin13 receptor, alpha 1 [Homo sapiens]; Similar to interleukin13 receptor, alpha 1 [Homo sapiens]; bB128O4.2.1(interleukin 13 receptor, alpha 1) [Homo sapiens]; interleukin13 receptor, alpha 1*5AA504266Hs.8217nuclear protein SA-2; bA517O1.1 (similar to SA2 nuclearprotein); hypothetical protein [Homo sapiens]; stromalantigen 2 [Homo sapiens]5AA932696Hs.8022TU3A protein; TU3A protein [Homo sapiens]5AA973494Hs.153003serine/threonine kinase; myristilated and palmitylated serine-threonine kinase MPSK; protein kinase expressed in day 12fetal liver; F5-2; serine/threonine kinase KRCT;erine/threonine kinase 16 [Homo sapiens];5N45100Hs.34871HRIHFB2411; KIAA0569 gene product; Smad interactingprotein 1 [Homo sapiens]; smad-interacting protein-1 [Homosapiens]4AA418410Hs.9880cyclophilin; U-snRNP-associated cyclophilin; peptidyl prolylisomerase H (cyclophilin H) [Homo sapiens]4AA725641Hs.154397WD-repeat protein4AA954482Hs.222677SSX1; synovial sarcoma, X breakpoint 1 [Homo sapiens];synovial sarcoma, X breakpoint 8 [Homo sapiens]; synovialsarcoma, X breakpoint 1; sarcoma, synovial, X-chromosome-related 1; SSX1 protein [Homo sapiens]4H45391Hs.31793null4T86932Hs.131924T-cell death-associated gene 8; similar to G protein-coupledreceptor [Homo sapiens]3AA279188Hs.86947disintegrin and metalloprotease domain 8 precursor*3AA485752Hs.9573ATP-binding cassette, sub-family F, member 1; ATP-bindingcassette 50; ATP-binding cassette, sub-family F (GCN20),member 1 [Homo sapiens];;3AA680132Hs.55235sphingomyelin phosphodiesterase 2, neutral membrane(neutralsphingomyelinase); Unknown (protein for MGC: 1617)[Homo sapiens]*3AA977711Hs.128859null3W93370Hs.174219NKG2E; type II integral membrane protein; killer cell lectin-like receptor subfamily C, member 3; killer cell lectin-likereceptor subfamily C, member 3 isoform NKG2-H; NKG2E[Homo sapiens]; NKG2E [Homosapiens]; NKG2E [Homo sapiens]2AA036727Hs.180236null2AA071075Hs.25523Alu subfamily SP sequence contamination warning entry.[Human] {Homo sapiens}2AA464612Hs.190161PTD017; HSPC183; PTD017 protein [Homo sapiens];mitochondrial ribosomal protein S18B; mitochondrialribosomal protein S18-2; mitochondrial 28S ribosomalprotein S18-2 [Homo sapiens]2AA481250Hs.154138chitinase precursor; chitinase 3-like 2; chondrocyte protein39; chitinase 3-like 2 [Homo sapiens]2AA598659Hs.168516NuMA protein {Homo sapiens}2AA682905Hs.8004huntingtin-associated protein interacting protein2R17811Hs.77897splicing factor SF3a60; pre-mRNA splicing factor SF3a(60 kD), similar to S. cerevisiae PRP9 (spliceosome-associated protein 61); splicing factor 3a, subunit 3, 60 kD[Homo sapiens]; Similar to splicing factor 3a, subunit 3,60 kD [Homo sapiens]2W93592Hs.47343hWNT5A; wingless-type MMTV integration site family,member 5A precursor; proto-oncogene Wnt-5A precursor;WNT-5A protein precursor [Homo sapiens]1AA017301Hs.60796artifact-warning sequence (translated ALU class C) - human1AA046406Hs.100134unnamed protein product [Homo sapiens]; hypotheticalprotein FLJ12787 [Homo sapiens]1AA256304Hs.172648Unknown (protein for MGC: 9448) [Homo sapiens]; distal-less homeo box 7 [Homo sapiens]; distal-less homeobox 4,isoform a; beta protein 1 [Homo sapiens]1AA416759Hs.239760Unknown (protein for MGC: 2503) [Homo sapiens]; unnamedprotein product [Homo sapiens]*1AA448261Hs.139800high mobility group AT-hook 1 isoform b; nonhistonechromosomal highmobility group protein HMG-I/HMG-Y[Homo sapiens]1AA452130Hs.28219Alu subfamily SX sequence contamination warning entry.[Human] {Homo sapiens}1AA457528Hs.22979unnamed protein product [Homo sapiens]; hypotheticalprotein FLJ13993 [Homo sapiens]; FLJ00167 protein [Homosapiens]1AA460542Hs.121849microtubule-associated proteins 1A/1B light chain 3;microtubuleassociated proteins 1A/1B light chain 3;microtubule-associated proteins 1A/1B light chain 3 [Homosapiens]; microtubule-associated proteins 1A/1B light chain 3[Homo sapiens]*1AA479952Hs.154145Alu subfamily SX sequence contamination warning entry.[Human] {Homo sapiens}1AA481507Hs.159492unnamed protein product [Homo sapiens]1AA504342Hs.7763null1AA598970Hs.7918unnamed protein product; hypothetical protein; dJ453C12.6.2(uncharacterized hypothalamus protein (isoform 2));hypothetical protein [Homo sapiens]; uncharacterizedhypothalamus protein HSMNP1 [Homo sapiens]*1AA630376Hs.8121null*1AA634261Hs.25035null1AA677254Hs.52002CT-2; CD5 antigen-like (scavenger receptor cysteine richfamily); bA120D12.1 (CD5 antigen-like (scavenger receptorcysteine rich family)) [Homo sapiens]; CD5 antigen-like(scavenger receptor cysteine rich family) [Homo sapiens]1AA757564Hs.13214Probable G protein-coupled receptor GPR27 (Superconserved receptor expressed in brain 1). [Human]1AA775888Hs.163151null1AA844864Hs.4158regenerating protein I beta; regenerating islet-derived 1 betaprecursor; lithostathine 1 beta; regenerating protein I beta;secretory pancreatic stone protein 2 [Homo sapiens]*1AA862465Hs.71zinc-alpha2-glycoprotein precursor; Zn-alpha2-glycoprotein;Zn-alpha2-glycoprotein; alpha-2-glycoprotein 1, zinc; alpha-2-glycoprotein 1, zinc [Homo sapiens];;1AA989139Hs.16608candidate tumor suppressor protein; candidate tumorsuppressor protein [Homo sapiens]1AI253017Hs.183438U4/U6 snRNP-associated 61 kDa protein {Homo sapiens}1AI394426Hs.57732acid phosphatase {Homo sapiens}*1H99544Hs.153445unknown; endothelial and smooth muscle cell-derivedneuropilin-like protein [Homo sapiens]; endothelial andsmooth muscle cell-derived neuropilin-like protein;coagulation factor V/VIII-homology domains protein 1[Homo sapiens]1N41021Hs.114408Toll/interleukin-1 receptor-like protein 3; Toll-like receptor5; Toll-like receptor 5 [Homo sapiens]; toll-like receptor 5;Toll/interleukin-1 receptor-like protein 3 [Homo sapiens]*1N45282Hs.201591calcitonin receptor-like1N46845Hs.144287hairy/enhancer-of-split related with YRPW motif 2; basichelix-loop-helix factor 1; HES-related repressor protein 1HERP1; GRIDLOCK; basichelix-loop-helix protein; hairy-related transcription factor 2; hairy/enhancer-of-split relatedwith YRPW motif 2 [H*1N48270Hs.45114Similar to golgi autoantigen, golgin subfamily a, member 6[Homo sapiens]1N59846Hs.177812Unknown (protein for MGC: 41314) {Mus musculus}1R16760Hs.20509HBV pX associated protein-81R44546Hs.82563dJ526I14.2 (KIAA0153 (similar1R92994Hs.1695metalloelastase; metalloelastase; matrix metalloproteinase 12(macrophage elastase)*1T51004Hs.167847null1T56281Hs.8765metallothionein I-F; RNA helicase-related protein [Homosapiens];metallothionein 1F [Homo sapiens]1T70321Hs.247129G3a protein; Apo M; apolipoprotein M; Unknown (proteinforMGC: 22400) [Homo sapiens]; apolipoprotein M; NG20-likeprotein [Homo sapiens]1W45025Hs.170268Alu subfamily SX sequence contamination warning entry.[Human] {Homo sapiens}
Mdenotes genes used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and U133A-limited cDNA classifier are marked by *.


Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 4 are hereby incorporated by reference.

TABLE 5Censored survival analysis using SAM; seven genes selected withmedian estimated FDR of 13.5%.GeneBankUniGeneIDIDDescriptionN36176Hs.108636membrane protein CH1AA149253Hs.107987N/AAA425320Hs.250461hypothetical protein; MDG1; similar to putativemicrovascular endothelial differentiationgene 1; similar to X98993 (PID: g1771560)AA775616Hs.313OPN-b; osteopontin; secreted phosphoprotein 1(osteopontin, bone sialoprotein I, earlyT-lymphocyte activation 1)N72847Hs.125221N/AAA706226Hs.113264neuregulin 2 isoform 4+AA883496Hs.125778N/A
Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 5 are hereby incorporated by reference.


Cross Platform Validation


Systems and methods of the subject invention can be tested by applying a classifier to an immediately available, well-annotated, independent test set of colon cancer tumor samples (Denmark, as described above) run on the Affymetrix platform. Using database software such as the Resourcer software from TIGR (see also Tsai J et al., “RESOURCER: a database for annotating and linking microarray resources within and across species,” Genome Biol, 2:software0002.1-0002.4 (2001)), genes can be mapped out from the cDNA chip to a corresponding gene on the Affymetrix platform.


The linkage is done by common Unigene IDs.


In one embodiment, 12,951 genes (out of 32,000) were mapped to an Affymetrix U133A GeneChip. In certain instances, probes on the cDNA chip are unknown expressed sequence tag markers (ESTs) which can reduce the number of usable genes identified. Thus, a classifier of the subject invention can address this lack of correspondence in platforms. Accordingly, in a related embodiment, a U133A-limited cDNA classifier was constructed in accordance with the subject invention by using the identical approach on this reduced set of overlapping genes.


With the U133A-limited cDNA classifier, only those cDNA probes are chosen that (according to Resourcerer) mapped to an Affymetrix probe set. This approach enables cross-platform comparison. For example, the training set samples were used together with the test set tumor samples in a flip-dye design. The end expression value from a cDNA probe is then the log2 of the training set to test set sample ratio. This same reference RNA was used on two U133A Affymetrix chips.


Once the U133A-limited cDNA classifier was constructed, a linear scaling factor based on the expression of a common training set (H. Lee Moffitt Cancer Center & Research Institute, Tampa, Fla.) sample applied to both the cDNA microarrays and the U133A GeneChips, was applied equally to all Affymetrix samples (training set as well as test set samples from DENMARK). Using this assumption, the U133A chip value corresponding to a cDNA probe is the ratio of training set to test set sample (on U133A chips). Each of the Affymetrix U133A arrays (both the test set and the reference samples) was scaled to a constant average intensity (150) prior to taking the ratio and the test sample chip values were averaged.


The results of a full LOOCV for the U133A-limited classifier on the test set sample (Moffitt Cancer Center cDNA microarray data set; original 78 samples) are shown in Tables 6A-6C. The accuracy of the U133A-limited classifier was 72% (80% sensitivity/59% specificity), which contrasted from the original cDNA classifier results (90%, P=0.001154). Many ESTs were selected both in the SAM survival analysis and in the original cDNA-based classifier, indicating unknown genes (ESTs) may be very important to colorectal cancer outcome. The U133A-limited classifier was not significantly different, however, than the Dukes' staging (77%), P=0.4862 using a two-sided McNemar's test, and still significantly discriminated the two groups, as can be seen in FIG. 3B (P<0.001).



FIGS. 3A through 3C illustrate survival curves for molecular classifiers in accordance with the subject invention. Specifically, FIG. 3A illustrates the survival curve for a cDNA classifier of the subject invention on the 78 training set samples (LOOCV); FIG. 3B illustrates the survival curve for the U133A-limited cDNA classifier (LOOCV); and FIG. 3C illustrates the survival curve for an independent test set classification (Denmark test set sample). A large difference in sensitivity can be seen between the Dukes' method and the classifier (Tables 6A-6C). The confusion matrix and accuracy rates by Dukes' stage are also presented in Tables 6A-6C.

TABLE 6ALOOCV Accuracy of Dukes' vs. Molecular Staging for all tumors.ClassificationTotalMethodAccuracySensitivitySpecificityDukes'76.9%63%97%StagingMolecular71.8%80%59%Staging









TABLE 6B










Comparison of Molecular Staging and Dukes' Staging Accuracy









Dukes'
Molecular
Dukes'


Stage
Staging
Staging












Adenoma
67%
100%


B
70%
70%


C
64%
55%


D
80%
97%
















TABLE 6C










Confusion Matrix of cDNA Classifier Results












Observed/Predicted
Poor
Good
Totals







Poor
38
 8
46



Good
14
18
32



Total
52
26
78










With respect to comparing the predictive power of a classifier of the subject invention to Dukes' staging, the U133A-limited classifier was tested on the test set of colorectal cancer samples from Denmark that were profiled on the Affymetrix U133A platform. The normalized and scaled test-set data were evaluated with the U133A-limited cDNA classifier. Because the Denmark cases included only Dukes' stages B and C, classification of outcome by Dukes' staging would predict all samples to be of good prognosis. The accuracy of the cDNA classifier was reduced from 72% in LOOCV of the training set (Tables 6A-6C) to 68% in the Denmark cross-platform test set (Tables 7A-7C). A diminished accuracy (4%) was expected due to the limitations imposed by cross-platform analyses, however this reduction was very small compared to that caused by limiting the classifier gene set to U133A content. This result is not significantly different from that achieved by classification using Dukes' staging (64%, P=0.7194 using a two sided McNemar's test) and is better than other reported results (47%) (see Sorlie T et al., “Repeated observation of breast tumor subtypes in independent gene expression data sets,” Proc Natl Acad Sci USA, 100:8418-23 (2003)) for cross-platform analyses where scaling was required. Moreover, the classifier of the subject invention was able to predict the outcome for poor prognosis patients (sensitivity) with an accuracy of 55% whereas 0% would be predicted correctly by Dukes' staging.

TABLE 7AAccuracy of U133A limited Molecular Staging on Cross-PlatformDenmark Independent Test Set.Classification MethodTotal AccuracySensitivitySpecificityDukes' Staging  64% 0%100%Molecular Staging68.5%55% 75%









TABLE 7B










Comparison of Dukes' Staging and U133A limited Molecular Staging


Accuracy on Cross-Platform Denmark Independent Test Set.











Dukes' Stage
Molecular Staging
Dukes' Staging







B
64%
79%



C
70%
58%

















TABLE 7C










Confusion Matrix of U133A limited Molecular Staging Results on


Cross-Platform Denmark Independent Test Set












Observed/Predicted
Poor
Good
Totals







Poor
17
14
31



Good
14
43
57



Total
31
57
88










The present invention provides a colon cancer clinical classifier with significant accuracy in LOOCV that exceeds that of Dukes staging. The utility of the classifier of the subject invention can be validated, such as against in an independent colon cancer population using a completely different microarray platform. The gene classifier of the subject invention can be based on a core set of genes that have biological significance for any type of cancer, including human colon cancer progression.


Application of Prognosis Classifier with Therapy


The benefit of adjuvant chemotherapy for colorectal cancer appears limited to patients with Dukes stage C disease where the cancer has metastasized to lymph nodes at the time of diagnosis. For this reason, the clinicopathological Dukes' staging system is critical for determining how adjuvant therapy is administered. Unfortunately, as noted above, Dukes' staging is not very accurate in predicting overall survival and thus its application likely results in the treatment of a large number of patients to benefit an unknown few. Alternatively, there are a number of patients who would benefit from therapy that do not receive it based on the Dukes' staging system. Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.


The molecular staging/classifier of the subject invention provides more accurate predictions of patient outcome than is currently possible with current clinical staging systems, which may, in fact, misclassify patients. In accordance with the present invention, a set of genes is derived from a genome wide analysis of gene expression using known microarray analysis techniques (i.e., SAM). By clustering groups of patients with good and bad prognoses, it is illustrated that the prognosis/classifier of the subject invention presents outcome-rich information. In a further aspect of the present invention, a supervised learning analysis can be used to identify a core set of informative genes. In a preferred embodiment, a core set of 43 genes was identified that appeared in 75% of the cross validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microarray that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients.


A means for validating a prognosis/survival classifier is provided by the present invention. In one embodiment, to validate a cDNA-based classifier for human colorectal cancer, a normalized and scaled oligonucleotide-based colorectal cancer database from Denmark was evaluated based on the Affymetrix U133A GeneChip™. In a related embodiment, a colorectal cancer classifier (U133A-based cDNA classifier) was produced on the training data set using a limited set of genes common to both the U133A and the cDNA microarray (for 78 genes). The U133A-based cDNA classifier was then applied directly to the normalized and scaled Denmark test population.


In addition to identifying those patients for whom therapy is most beneficial, the classifier of the subject invention can identify those genes that are most biologically significant based on their frequency of appearance in the classification set. In one embodiment, those genes that are most biologically significant to colorectal cancer were identified using the classifier provided in Example 1. Specifically, osteopontin and neuregulin reported biological significance in the context of colorectal cancer.


Osteopontin, a secreted glycoprotein and ligand for CD44 and αvβ3, appears to have a number of biological functions associated with cellular adhesion, invasion, angiogenesis and apoptosis (see Fedarko NS et al., “Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer,” Clin Cancer Res, 7:4060-6 (2001); Yeatman T J and Chambers A F, “Osteopontin and colon cancer progression,” Clin Exp Metastasis, 20:85-90 (2003)). Using an oligonucleotide microarray platform, osteopontin was identified as a gene whose expression was strongly associated with colorectal cancer stage progression (Agrawal D et al., “Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling,” J Natl Cancer Inst, 94:513-21 (2002)). INSIG-2, one of the 43 core classifier genes provided in Example 1, was recently identified as an osteopontin signature gene, suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival.


Similarly, neuregulin appeared to have biological significance in the context of colorectal cancer based on frequency of appearance in the classification set of the present invention. Neuregulin, a ligand for tyrosine kinase receptors (ERBB receptors), may have biological significance in the context of colorectal cancer where current data suggest a strong relationship between colon cancer growth and the ERBB family of receptors (Carraway K L, 3rd, et al., “Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases,” Nature, 387:512-6 (1997)). Neuregulin was recently identified as a prognostic gene whose expression correlated with bladder cancer recurrence (Dyrskjot L, et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003)).


Accordingly, the identification of such genes may be significant in terms of gene therapy. For example, a therapeutic gene may be identified, which when reintroduced into tumor cells, may arrest or even prevent growth in cancer cells. Additionally, using the classifier of the present invention, a therapeutic gene may be identified that enables increased responsiveness to interventions such as radiation or chemotherapy.

SequencesACCESSION No. AA149253ORIGIN1aatatggaca gggagtctca ttgtgtttat catatcaatt aatattacag tacatccttg61gtaatacaaa attgtacacc ttcatcaaat aaattaggat aaattaaacc aataaattat121gcaaagtctt cagaacaata gacaacaaca aaaattcaca attgaaattg cctctagcta181aaaaaaacaa acaaaaatca aaaattgact ttatcagttc agttattgta ctatattcaa241atcaaagggt ctttattaca aaaaagagct taataatgct atttacaaca tattgctaaa301taatataaag gcagtgtttt gtcacggttt atactatata catatgagaa atggctggga361caatattgag ggaagcccat gaccttttgg attcttccag gtagcgctga gaccnatccc421aatacatttt ttttccttag ttccaaattt gganggcgta atatngcagt tttnagaaat481tttccncccc ccntttttag gggggattgg atattttana aaaattccgg atggaatacg541gtttccccna aggagggtag cntggttACCESSION No. AA775616ORIGIN1tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat61acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt121gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat181taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc241ttttagttta cagggagttt ccatgaagcc acaaactaaa ctaattatca aacacatcag301ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat361gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt421gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg481aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg541ctgACCESSION No. AA045075ORIGIN1ttttttnttt tttttttttt tttttttttt tccaggaaag acagatgtta tttaccacca61atgaattttt atcatattta aatgaacttg aaaatgtcat tcaactcaaa tccctcaatc121aacttacttc agcccattct gaaacttcat attgcagcaa accagccatg tgaaagaaat181aaattcaatACCESSION No. AA425320ORIGIN1ttttcaggtt gtaaatattt atatttctct cacatacaat gttgtatgag acacttgttt61taatatgtat ccataggatt aatactcata tggagtataa tgtggaaaag tgcagaacta121aagaaataag tctatccgaa aacaaaagca cacatttctc aggatttaaa aatattgcac181atagtaaggt tgcacagaaa ttactggctg gttttacaaa cagaatgagg tatcagtcaa241tctctagata aagatgagag agaggataaa ctacacacac acaaacacat aaatccatac301taagacctaa gagtgccaac aactaagaaa gaaatatgaa aaagctatgt taggtagcca361ggatttcaac actacaaaat catttttagg ctggaaccaa acacataaca atctcttggc421aatatttcgt taagttttca acttttttcc agcctaaatg actatgggca ataaaaccat481ttcctttacc ccagttctac tgtagaaagg cacagcgctg tggtaaatat caaaccattc541ctttctcaacACCESSION No. AA437223ORIGIN1tttggtgaat aaactaacag ctttattaat gaaggcaaac atcagatcat tgtatgaata61ttatatatat atataaaaag aaatccaaac taacagcatt gtatttcaaa agtactgtac121ttctgtttct tttaaagaga cttgtcatct gtttttataa aacaaaatgg gtactcttct181cctaaaaaat cctggaaaaa tgaaatagtc aatttcaagc tgatgaattg aacacacctt241tctttaaatg cagactattg ctaggaagca aataaagtca agcatcagaa agaagatgta301tgagaaatgc atgaaagtca gagaaaaggg atgtagtgaa attactgcta atctttcccc361cctatattca aagaccatcc aaaactggtc tttcatacaa atataaaata actataaaga421gagggaattt gaaaccatac ccatctgaaa tcACCESSION No. AA479270ORIGIN1ctctgaattc atttatttag aggtaaaaca cagccattca aaattgtgga atacaatgtc61tacacacaga ataaggttgg ggaattaagc tgaattgtta tattccattc acattaataa121atatttttaa agaagaaatt gtagatttta aaagcttcat tagacactag tgacacatac181aaataactaa actctcatac tgcttgattt tcaggttgaa aggttacaat aatctatata241tttcaattac atggcagtaa atacaaaagc attttaaaca tcttttgaac tgtgtagtat301actataagca ggagtttACCESSION No. AA486233ORIGIN1caaattgaat attttattaa catggtagtt gcctttgtaa catgtgcaca cacactcgca61cactcagaat gatctgcctg ggggaaaaat actaaatatg cctaagggga aaatgaaaaa121taaaaaaatt cctgtaggtt ttcattattg taggcaatta tgtccacatc acttacaaag181ctattgccaa atctgtccaa ggaagcagag tttgaagtga gggctaggga caggaatctt241gggaaaaatt caacagtggc atagcagagc tctcaatatg agaaagctga cataatgtgg301acttttgctg tgaattacct ctttgcaaaa tatggggaga ggtttatcaa tgggcagaaa361ataagagaag gcggtgtgaa gtaggcttct gcagtcaatt ttcctcacag tattgtgcag421ggtcatcaag aaaatgctta gtctttctct ggaaccagtt tcagaacttt tccaattgca481atggtcttac cctcatctct taagggtgaa cgacccacct aagggaagtc tttaaagACCESSION No. AA487274ORIGIN1tattactgca tatgttatat taaatttaca caatgatata taaaaacaca tactgtttat61attatatagt aatttaacat caacaggagt atcaacacaa gtactactca tgcacaaaac121atgcatatat tggtatacaa aaagcaattt tacacaatac tgtttaccaa aaattttttc181ttaaaaaaca gcccttccac ataggatcaa aggtccaatc tggactggat tgcactaata241tgttcaggtc aacgcttcgg tggcatagcg ctcagtgagc aattctggga ttggagtcat301gcccaagggc tacttcatta atagtgaACCESSION No. AA488652ORIGIN1tttttttttt tttgcaacgc aagggctctt tattgtcagc gagacgagca ggccaaacgg61gcactgaggc tccacggggc ccaggcctct ttccgtggaa gagaggcaag aggggtttca121ggattcagag gggtcctccg ctcacgcagc accatgcaaa tatagagcta aaaactttct181gaatgtctct ggcttgaaac caactgggcc aacaggttcc acaaccactc tctttttgat241cactgggaga caccaaaaat gctgatagag gagctggtct gagtccaccc aggccaaatt301cttgacaccc tcgttagagt ccaggtctgt ggtattcagt tgaaacacta ggaaatggaa361gacacgtcca tccgtgccca ggctctgcac caccacgggc tgctccaaga ccttggcatc421attcccatag aggagccggg cctgagcagg gcactgcaaa agcaaacagg atcatcttgg481cccgcagctg atctggttga aggcggtgtg gtcgtaaatt ggctttgtcc agtaagtaca541gggtatgggg ataggggtaa ggatagACCESSION No. AA694500ORIGIN1tttgacagaa gaaacatttt taattgttct tgtcctgccc catcaccagg ggagtcccgg61cattgctcag gctcactgcg cttgctttcc cctgggatgt cgaggacact ttgacctcat121ctatgtcata gcccatgtgt ttctcagatg ccaccgccat aagatctagt gccccctggt181gccattggga taggcaggcc agagaggcat gggagctggg tgtgcaccag gccacagggc241tgtggggcat gcagccgatg gtgcagcttc aggtggatgt gctgggtgaa gcgactccgg301cagacactgc actggaaggg ccgggtccgg aggtgcaACCESSION No. AA704270ORIGIN1ctaaatcaag tagtgctact gaaatccagt gcctaatgga gcagatggtg gaggtcttag61actctggaac atttatagtg atgcttctga atgcaaaaca ccaagagtgg atttcacagg121ctgtgaatct gatttgattt tgatgggagt aaagcttcca ttttcactgt acttgaacca181caaaagaaaa aaagcatgtg tgactgacac aagctagtta agaaaaagga acatgttaaa241tattagtccc ataaagggaa gcagtttaaa caagtgatta tttgtttgta tcatttaaca301tgattatgtt tgtatacaat accaccgtttAA706226ACCESSION No. AA709158ORIGIN1tttttttcct tcaactccct ccaagttgtt tatttaataa taataaaaaa gaaatgcaca61cacataaacc tgaactcccc cccaccccac cctcccttac tcccagtaac tagctccaaa121atgaaaaaac ttcccttgtc ccacctgggg actaaattcc cacctccact gccataacac181tagagaaaca aaataaaaaa tatgcagcag ctcaccaccc accccacaac tgaacctcac241acaatcccct caaacaaaga agccaggact gggggttcac aggaatgaga ggagccctat301attctgaaaa gggatgagaa gagaggtgaa cacccccacc tcaaataagt gcttaacccc361cacacctgct ctttccttta ccaattgccc caagcctggg gaatcaggga aatttgaaac421agtACCESSION No. AA775616ORIGIN1tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat61acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt121gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat181taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc241ttttagttta cagggagttt ccatgaagcc acaaactaaa ctaattatca aacacatcag301ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat361gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt421gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg481aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg541ctgACCESSION No. AA777892ORIGIN1cagcttgcat cataagtttt attcccgatg cgggacagat ctttccatcc ctcaaatgta61ttacatgtcg ccacggaagg gcttaggatg ctgctcccat ctccaggaaa gatgagaaaa121aggtacagac tgggagccag tccaggacca ttctgcagtt cctggctctc ttaccctccc181ttctcagcag aggaattatc tctcatccat tcagttaaaa agaaaaaaaa aaaaatcatt241aacaaaacaa aacacacctt aagtattggg caggggtgtt cttgtcctca gtaggacgtc301aagttctggg tcaccaatgg tgattttttt tgtttttgtt ttttgtcatt tttgtttgtt361attttttttt tttnnatttg ttagttatgg ntagcagttg tgtgtccacc tcatctgcag421gcagctgcac atagcggacg actgagcccc tgatgaagca gttcttgact gataacatgt481gagggtattt ctcagggtct gtgacactga tgtcggttag tttgatattg aggtactggt541ccacagagtg gagggttcca cagatgctca ggtcattctt gagttccacg actacatacc601ttgccacaag agacttgaaa aaggagtaga agagcatACCESSION No. AA873159ORIGIN1tttctgtagg atttttattg gtggcacctg gggccacatg gagggagtcc tcagcacagg61cgctggggtg tgggaaattt cagaggcccc tcctgggatg tcacccttca ggtcctcatg121agtcaatctt gagtttctcc ttcactttct gaaatggctc tggaaaacca ctcccgcatc181ttggcagaaa gttcactctg tttgatgcgg ctgatgagtt cccgagcctt gtcctccagt241gtgtttccaa actccttcag cttatccaag gcactggaga cgtctggggt cccctgggct301ggggctgggc cttccaagac gatcgacaga accaccacca ggaccgggag cgacaggaagACCESSION No. AA969508ORIGIN1tttttttttt ttttttcact tcttcaacaa gtatttattg aacgccaact atggaccagg61ccctgtgctc aatgctgggt acagagtgga gactgaacca ggcatggcac ctggcctcat121gagcttacac tcgagtggga ggcacagtca accaacaagt aaattacaca aatggatatg181cagtggcaaa ttctccatga agggaaagaa cagaggcctt gtgatagagg aactccacaa241gtaaagtagt cgaggaaggc ctcttggacg aggcaacgtt gaagccaagg cctgagggtc301tgcagaactc agccatgcac agggtagggg aagagcattc ttggcaaagg gaacagcata361tgcaaagtgACCESSION No. AI203139ORIGIN1ttttttgagt ttggcatgtt aatttttatc agcgacttct ggggcctagc accattcccg61gaagaaggga gttgtcgggc agggtcctta atgggggttg caattcttgt cttggttggg121aaagagccta gctgggaaca ggggtcgttt gtgtagtaac tgtattaagcACCESSION No. AI299969ORIGIN1gcggccgcgc cggctccagg gccatttagc ccccaggagg agaatcgagc aatctttttg61gaagtccaga agaagctact ccttccagca ggcctaatag gatggcatct aatatttttg121gaccaacaga agaacctcag aacataccca agaggacaaa tcccccaggg ggtaaaggaa181gtggtatctt tgacgaatca acccccgtgc agactcgaca gcacctgaac ccacctggag241ggaagaccag cgacattttt gggtctccgg tcactgccac ttcacgcttg gcacacccaa301acaaacccaa ggatcatgtt ttcttatgtg aaggagaaga accaaaatcg gatcttaaag361ctgcaaggag catcccggct ggagcagagc caggtgagaa aggcagcgcc agaaaagcag421gccccgccaa ggagcagACCESSION No. H17364ORIGIN1tttttacttg aaattaaatt tggnctctaa agttggtgta gcagcagttg atcagnactg61aaaaacggtt tttagtctcg gaaaaagact gattttgctt ttttataaat attattagat121ttattaattt ttcgtgctca atgtgtaaat tgtattataa ttcattgtga tttatttcac181ttttaatttg ctggtgtttt aataaatggg ggtgttactg aatctttctt cccacttcca241tttcttttga ccacccctta accctcaact gtgacggtag tagtattatc atttatacca301aagttttgca tagtccctgt tgactttgta atgttaacgg agtcataaaa gcactaggca361agagaaagat agaaatttgc ttttaatctt tttgcctttt attttgcaca ttatgcaaaa421gggaaaacat taaaggacac tttttttaag ngagtgaaac atgggnaagg catccagtgc481tttatgcaca ttgtnagcta atcaggccat tatACCESSION No. H17627ORIGIN1tttttttttg ggcagatgag aaacagaatt atcatcagag tcttgctaca aacagggaaa61aacacaaacc aagatgacac acggacatgg tagattaaac attcctcccc accttcagga121tacatttaca ttgnaataaa tactgcaatc tcagcagcgg caaacaagga ggaatntagg181aaatgcccac ctcctcccct ctgtcttatc tgtgtgctct cttccttggg tagcaccgat241ctccccaggg tgctgggtga gaaacaggac aggggngaag aggtccgtgc atgctcactt301gcccttttgcACCESSION No. H19822ORIGIN1gaagtcatan tatgataaac attttattac actaaaaaag tcatctgtta actgactgaa61ctgcaggggg accacatgtg aggttacttc agaaaaatgg catcagataa catatataga121tttctggcat tataaaatgg ctagattctc ccctaccttc cctcattaaa tattaatcag181tggcttaggt cagttctagt gggaacactt aattgctgac ttcacataaa accaggntta241gcctaatgtg ccaatggtat gagtccattc ctgggccatn ttcccaacag ccagaccgct301gtggcttgga caccggaggc aacatctggg gggcctcagt tccactcctc tgtggtnagc361ttgctttccc aataactggc tntggagtca catcaacaat ggtggc attn catctggggn421ccacatgagc cctttggggg tgctgcatcc ctactngACCESSION No. H23551ORIGIN1ttttttttta tgcacactaa ggnatatttt attgtggcat taattagatg aaagttagta61atatgncatt gaccaaaaca tttgattgac aagnaccata aaggttaact gagagttttc121tttaatataa ttgttgtaca gacaaggatt cctgctgtat agagtatata gaaggatgac181atactctagg aattaggaac aatatatatt caatacaata acaaaactat atagtacttt241aagaactctt tcacatatat gaacactctt acttaggaac ttcagctgtt taaagtaagc301aatatgcaaa cctataaagt acacaccaaa aaaatctaac ctacaaaaca cccaaagcaa361atgttagcat atctctatta tcaagaatat cttctcacca tcgtttcttt caaaaatatg421tgaaaaagtt ctttctttcc ttatgagtgg caatttttaa aggcccctct tctgaaatta481gntatgttcc aatccactat cactcttaag ggaaaatgga acdnctctgg gACCESSION No. H62801ORIGIN1aatgatatca gaacctttta aatgatctag tatctgtgat gttagcgccc ttgggattca61gaaagtggtg tgcatagtaa aagctttcat tgtaactcac cctgcctaga tatgcagaaa121gcaaattcag tgataagatc tttcctggga gaccaatcag cagcctcagg ctctgttggg181gtctatcaca atgatgttat ctaaatttag ggcaaggaac cctttcccca tcttttagag241ggcagtgagt gttctaatca cttcaagata ggtatctgat aaaagtcttg gggccaactt301tttcatactt aggnagggca caactaaaat ggatatactt aaaatggtat caaaggaggg361ttaggtgtac actctactag gtgtaaggtn tatttcatta caaaatggct ttggACCESSION No. H85015ORIGIN1cacccaggct acagtgcagt agagcaatca caactcactg cagcctcaac ctccctgggn61ncatgcaatc ctcccacctc agcctcgcaa gtagctcgga ccatggccac acgccaccac121acccggccaa ctttcgtact tcttgcagag agagggattt gccatgttgc ccaggccggt181cttgaatttc cgggctcgag tgatccactc acctcagcct cccaaagtac tgtgattaca241ggcatgagnc actntgccca gccaataaan tctttACCESSION No. N21630ORIGIN1gaacagacta aatttgtttt aacaatccca tttacaattc aaattccttt aaacaactta61atagcattta tacatttaaa aaaatgattc ttttaagcag cattgcaaat gcttgacccc121attagcataa accttcccaa gtgcttaact ctcataaaca taataaatta aacatatggt181gactttccaa gttctctgaa acatttcagt acttttgcag acttagtaac attttaaaat241acctttcaac tgaaactcat aagtctaaaa gtctgttaag cattttaaat tagaatctta301aggccagtgt cacatattgt aatatgccaa ttatgtttaa atacttcaaa cagcaaatac361tacagtttat ctcaatgaat ataataacca ttcctgctgg gcgcagtggc tcatgccttt421aatcccagtc attaaggagg ctgaggtggg aagattgctt gaaaccagga gattgcctca481ggcctgggca acatggtgag acctcctatc tcaaaaatcn aaataaaaat tagctgggca541ggtggctcat cctgtagccc agcntctcag gaggctgagg tgggaggata gcctcgccta601ggagacggag ctgcagtgag cACCESSION No. N36176ORIGIN1aataaagaca agtgttcaga tttatttgga aattcacagt ttctaatggc actacagctc61cgtagttaca tattgaaaat tctcttccca caacacacag atcacataat ttctcactgt121atctctgctc tcatctggac ctcttttcaa ggggcttcta taaaatcagg ncctcttgnt181cngganagnn nantngngcn gacaggaaag aaatttaaat cttctaaaac acgctgttaa241cctaaagcag caacttaaac aaacaaaaaa ggcgttaaat aagtcacatt acaaacaata301cccaagaaag gtattaggca agtttaaaaa cagttatcac tactaaaagt gctcaataag361ttataactta aacatcacaa caataaatgg tcaattctct ccctttcaaa aagaaacatg421ttccactttc attcactact gtacaatcat actaACCESSION No. N72847ORIGIN1attgttactc tagttttaat ggtttcacaa atacaaaagt tgctagataa gcagtaccaa61catatctaaa tctccaatga tgttcaatta aaattttatt tatagactca tacactcagc121aaaaccactc atttaataag tccaactgaa ataaattctt attaataaaa tacctatatt181gaaagtaata tattgtaaga actctacctt aaattgacca tggggatgaa ctacaatgtc241ataaaatatg agccaaaatg ttcactcaat aattttaatt acatcacaat taagcccaga301actatgcctt ttttttggtg taaggctgaa taaggaccga aactggatgg agagaaaatt361gctttctaaa gcctcattta ctggcaataa cttaccttat gcaataacca acatcacgng421actggACCESSION No. N92519ORIGIN1ttttttttaa ctcttaaaaa aaatcatttt attgatcctt taccatacaa aatttattca61aattacaccc atttgaagtg gtaagatcac agctagagaa caggtcaccc tgtaacaaat121ctatttacaa aatccatcat aaaagctttt ttttgttttt ttttacatta tattacatat181tttctttttt aaaagcatac aacacaaagc taaactgatt agtagtttgc ctactcccaa241ttttgggaga aatacttcct ttttacaaaa tcacgtnccc cgtaggaaaa gaaattccca301caccctgaca attggccaac cgacttactc tgcaagccat cttcttcaaa tccctccttc361tcatacacac gangttgtca tgcacacact gaatcntaat ttcttttccn ggaagcttaa421ncctttaaat accgggaatt attttcagat ctncacgtnc caacaaaaat ggaaacaagg481gccccaccaa gnccgggaaa acnaaaccca ataccctntt aaaaatttca aggcACCESSION No. R27767ORIGIN1tttttancna tttgtaaata agtttaattt ttnagttttt caatgacatt cagtagagat61agttatattg gctatataac acaagtaaag tggtgtttgg aaagtggagg actaggtttt121ggcacggggc taggacgggg tgaccgccgc ctcaccacca cagactggag ggggcttttg181agagctgggc ttcgctcccg aggactcagc tcagaaactg ctgaggcccg tgatgcagaa241ccagtgccgt aggtgggcat ctggccatgg cttcgagctc tcaggatgct tttgtatctt301gagagggtgc ctccagagaa tgtctgctcc ttgggcctca tctncccggg ttatnccccg361gcagACCESSION No. R34578ORIGIN1atttttgaag nngnttcgat gtcttactgt tatgaccata aaaccaataa agctactttg61aaaagttaaa gccaggngta attaaacaac tcatacttga ttgttaaagt cagtctctna121aaagtgtaat tttaaaaagg taataaaaaa ggtatancat tatACCESSION No. R38360ORIGIN1tttttttttt ttcaaaaatg tcaaacttta ttcaagtgtt atggtaagaa atttgaaatt61cttaggtaag ctantgaata aatccttggg caggtgcagg catacagatt ctggggtgca121gctgctgagt ttaaaagctt cctttggaga tgccccgnng gggnnacacc ccctntcccg181cctntcaaga ggaggccatc ctggggcagc acgttagggg caaatggccc agatgcccag241ctnagggaaa cctccatgcc tagaggagga ggtcgctctg ggagcaggag gaccttcttg301gaacccctgt tnacaggntc ctttttcttg ntttttccag nacctcctgc agggACCESSION No. R43597ORIGIN1tttttttttt ttttttcagg attcactgcc tggggtatcc cactatatat atctcaccta61tgatgtagtg gtgcttgaaa tactcatctc attagctcga ttttattatt ctaatctaag121gttttttata ttattcatac tatgatattt ttagggacaa tcagtaatat ttggggcaga181gtactgaggg acctcttgaa gtctgcaaca gcatgcattt tctttgtttt tgtggggagt241gcttccctgt aggctgtctt tgttctagga acactgnctc caaatttatt tccatgggga301tgtagggggc tagtaggccc atggtggaaa ggtcttctgt aaatctccnt gggggggtnt361gagttattgg gggttatttc taacagggan ttttcccaaa gggggACCESSION No. R43684ORIGIN1tttttttttt ttttcattca aaaatatata atttattgag tacttgctag acacaatgga61tacaatgatt atatagtccc aatcctccag gagaacaata gacagacacc tttataatat121gtatgtggag tgctctgaca gggaaaagca caaggtccat gggggtggga gtggcccagn181agctaaggaa ctcttccccc atgaagtggt tacttacttt ctaatcttta atttaggatt241ctctcatgga acatttgant ggtgaaattt tactacataa aggttctcaa ccctaggagg301tttatccctg cccccctggg aacatttggn caatgtctga acaacaagtt tattntcaca361actggggagg ggngaaggaa gttagcagag gccaaggatg nctggctaaa ccttaaattc421ctacatACCESSION No. W73732ORIGIN1tatttcaaaa aaagtctttt aattgttcaa aatagcacaa aacgacatcg cactatggta61atattgagtc acaggggtta cnctacaata gtgaacggng tactcncctc agaaacaaat121cantACCESSION No. AA450205ORIGIN1tttttgtttt ctttcattat ctttatttta aatttgatat tttagaatag gaaattatct61ttcacagcaa tgcctcctgg tctgataata cagtatctca tttctgaatg taaagattta121aaataaatca aaatgaacat taaggcgtac aaagctactt taagtctgct cttaagatca181gtttttgctc atattcaaaa tacatggaat gttggcacaa aactgaagct gctgtagaaa241gatcacagat gttctgtggg ttactcaaac ttccatttct ctaaaaacat acccttacat301ggtcttaatt ttatgaattt aagtgttgag aaatatctaa ataataagta acaattaaaa361taaaatgttt tatttgtaaa ttatgtacag aatacacttt acgttacgcACCESSION No. AI081269ORIGIN1tttttttttt ttctaaaact acctttattg tggttggctc gacataagat gccgccatca61gcagaattat aaaactgtac aggaggcaca aaaataggct gtttaactta gataatgacc121ctcatgtctt caagctttaa aaatgcacat aaaagttgta caatctggca gtttataaaa181tataaagcta aaaagaggat tttgggttcc acaaagaaga ctgtatcaca caattaacac241gtactaatta aacaattaac catccacaca gaagacataa tgACCESSION No. R59314ORIGIN1tttttttttt ttttcaaaaa ctttattctt ttctaataaa aatgatatat gttcattata61aaaagtttca aacacacatg agtctganga ntgtaaagat cacccaaata ccacagccca121gaaaaaaaaa tccttaacat ttggtganga tctctctatg aaacatacat tatcttaaaa181tattcaatgt tataaatgag ctcatattca acatatatcc tgtngtctac tttttgattc241aataatattt tgggaacata tatccatngc antaaacata tatctaaata tttttaaatg301acaactggca tgggnnttta tttaatccat cttttactga gggatgtttc agttgtttcc361aatgttttaa tatcataaac atcatggaaa tataccnttg gggctccatg tttgganggc421ttggggcaac cttACCESSION No. AA702174ORIGIN1catcttcagc attaagaagt gctgacacaa tatcattaac tgttttatag ttctctccag61ttgtcaggat tttactttga actgtttgtt tcaccaggtc tctattaaag cccatttcca121aggcagattt aaccacaggt gtattcatca tgacagcatc ttctgaagaa ctttctccag181gtccaaaatg aataattggt gggtcagcat tttcttctcc agtggtatct gaagttgaca241acagctgttc aagaagatga ggatatctac cttgaatctc atcaacaaac tcttggcctt301tcattcgtat caagaactca caccttggaa accacttggc atgttctacc catggatcat361ctccagattc ccaacacctc aagccaccat cacaacaaaa gcatttgaca tcatcattgc421gacccacata ataaaaacca gcacttgcaa gctgctcagg ctgaactgga acactagatg481gccagtacat aaatgttctc attcgagctg catgtgtctg catgctcaga tttgaaatgc541taaacctcag agtttctaga gaaACCESSION No. AI002566ORIGIN1tttttttttt tttttttttt tttttttttt ttttcacaat tcttaagtct tgttaagaaa61gtaaaaaacg tttgggtata ttttgatcca tgggtggcat tttcaaatgt gcaaaaacaa121agtcttggaa gagattcctt gtcactagaa agttcgccct tccttttgct gtcagttgta181cgtaagagaa attcgtccac attaaggaat ccaaaaaggg taaactaaag ggatttaaaa241agagtacatt acaaagaata agaagccctg taacatctat ctgagaatac tagataaatc301tgtgagtaga tgtggcacct ggagctactc actacattac taaaaacaga aacaagaaat361ctataatggc aggatcacaa catttgcgcg caaatagcta accACCESSION No. AA676797ORIGIN1aataccttct gttttaagtt tttcttttgt tttcatcttg gaaaaaagga aatttagaaa61taagacagga aaagaatggc ccagaaattc agcacaaaga gaggtgtaca cattgacgcc121atctgtgggt cacatacgaa cgcctctggg acagagctct aaaacgagtc acgtgtcgta181gggagtgggc ctgtggcaag gcagtcctcg cagtgtgcag ggacgcaggc ccccttacca241tggaagcccc acccagaagg aagtgggtgc cccatgcagg ccgaggtgga tgaggggaca301gtggtgtgct cacagctgtc agctccccac tgaagcccca aaccagcaga tgtgggcagg361ggctcaagtg gtgtctgact acccaggtca cacgtgcctt aagcgtgaaa gctgtcagct421cccggcacgg gctctggtgg ggctgggaac accaggacac acatgggctg aagcttccag481agacagtgag acacggaagg gacagagagg tgccctccac acagtgtgACCESSION No. AA453508ORIGIN1tttggttatt cagtatttat tctgcaatgc aaaggtgaca aactaaaata taaaaaggct61gttatggctt aacatttttg ttgcagatta aatatgcagc attgaaaaat ggaaaggcgt121ggcttcatct ctgaccagca gagttaaaaa gaaaaatctc tccattttcc ttcatcatca181tgggatacac tgttcaggca atccaaatta ataaagactt gcactttcat atgaacacaa241gatcaagtgt accagttagg ttttcacatt cacagtatat aagaaaatac acatggaagg301aaaagtaaag ggttaactACCESSION No. W93980ORIGIN1tgaatgaggc aacaaaagca gagatttatt gaaaatgaag gtacacttca cagggtggga61gtggcttgag caagtggttc aagagcctgg ttaccgaatt ttttgggggt taaatatcct121ctagaggttt cccattggtt acttgatgta cacccttgta aatgaagtag tgcccacaat181cagtctgatt ggttgaggga ggggacctat cagaggctga agcaagtttc aaagttacac241cctatgcaaa tctctgattg attgggaaaa ggctgaagtg aagttacaaa gttatactcc301tatgcaaatg aagacttggg cccatgacca gcctcattgg gttgtggaaa gggaccaatc361agaggtactt tcaatttttc catctaccat gcagaaaaag gttcgggggt ggggggttgc421caaagggaag ttagccnaac aaactcctga cctaccaaca gagggtccca gttgggtagg481ggggcctgggACCESSION No. AA045308ORIGIN1ctattaatca acacttttta atgtagtaca tatatatctt acagttattt aagtcaaata61tgtaaaggtt tacaactgat ttacagatga agcaatcaca gattgcagta atatgtgtgt121gtgtatatat atatttatnc catatataca cacacgccaa tcaaggggaa aactgcatcc181tggcaatttt acagtctgaa gttttgttgg tatatctacc atttcacatc cttttcatct241tgcttttctg tacaaaagat atttttngcc ttcttcattc ctgatgagat ttttctgcga301taactttaca ttcgtacatt gccagttgtc gaccaatgtt tcccattgtt atgcctccag361caaaaaatatACCESSION No. AA953396ORIGIN1atctgtcagt aaattacatg tatcctggct gtttatttca aaaatgcttc agtatgtatt61tcctaaaata gggatattct cctttgtaat cacagcaggg tagatactgc tctttagttg121tcatgtctct tagccttctt taatgtggaa cacgtccaca ccctttcttt atcttctgtc181ttttaaacat cttttctgtt gtccaatttt taacaacaaa gatgttaaaa atcagaaaac241tcagaaaagc acatggtgta ttaaaattcc acctaggaat aactgccatt aaagttttgg301tgtctccctt tctgtctctt cagatgcaac ttactagtct agacaaagca ggtttctcag361tgaataaaac atACCESSION No. AA962236ORIGIN1ctaatcctgc gaatatgggt agtgcttcgt tccatggacg ttacgccccg ggagtctctc61agtatcttgg tagtggctgg gtccggtggg cataccactg agatcctgag gctgcttggg121agcttgtcca atgcctactc acctagacat tatgtcattg ctgacactga tgaaatgagt181gccaataaaa taaattcttt tgaactagat cgagctgata gagaccctag taacatgtat241accaaatact acattcaccg aattccaaga agccgggagg ttcagcagtc ctggccctcc301accgttttca ccaccttgca ctccatgtgg ctctcctttc ccctaattca cagggtgaag361ccagatttgg tgttgtgtaa cggaccagga aACCESSION No. AA418726ORIGIN1tttgagtttc aaaggattta tttgatttcc ccacatgatc acaaccatgg ttttacattg61atagagtctg ttgccactga caaacagaat gcagatgaaa acaaacgcac tcctttcctc121tcaaaggtac acagtggggg tgccaggctt cttgtgaggg aggtgtcctt gaagtctctg181aacagtctgg ggattcagga cctgattcta attgcttaaa acaactcgga ggcaaaagat241attttccaag aggagatgca tgctgtgtgc agtctcgatg tgactgcaca cagaaACCESSION No. R43713ORIGIN1tttttttttg atgtgctaat tttatttttc taatacttac caaaataaat gccaccactt61aacatagaaa aaattgttcc catgtgacct aaaatcattc ctcagtcacc cctgaactgg121ctagtagcga gcatatgtgg agcggtggtg agggcaggat agcctggtta taggaaacct181cagantagga aagacctggg ttcaaatccc cactctgcca cttactagnc tgtgtgactt241tgggacaagt tgtgaaacct ctctgaggat ttatttcttc atgtaaaatg tcaccgataa301tggataactc agtgggtgta agantgatct attttaagga ttctagggca gagtcccngg361gcagggcagt taaggcactt aaataggatg gacaguctat tcattnaatt attaggcagt421tttttcctta atggagggtc cttgttggaa ggaccccttt tttcttaacc tccACCESSION No. AA664240ORIGIN1tgtgataggg ttccactttt tctctcatac tggtgtgcag ttgctgattc atggctcact61gcatcttcag tctcccatgt taaaggaatc ctttcacctc agcctactga gtgtgcacca121ccaggtccag ctaattgttt ttttaacttt tttttttttt tttttttctt ggtagagaca181gggtcccctc tgttgcccag gatggtttgg aactcctggg ctcaagcaat cctcccactt241tggcttccca aagtgctgag attacaggca tgagcactat gcccaacctg agcaggatga301cttaaacctg atcaattcta ctccaaaaca gcaactatca ttaagtcagg ggtgtcaagg361aggactctgt gaaggcaaag actagactgg gatgtgtgcg agagtgggat aagaaggccc421atccctagca gactgACCESSION No. AA477404ORIGIN1ggaaaacaaa aggaaaactt atttattctt agaggtggga atgtggggag tggggcagaa61caggtggtgg ccctgggaga gggtcccaag gggcagaggt tggggatgtc tcagtaaaga121ggggcaggtc atgaatagag cctccacccc cagcaggggt tccttgggcc cgcccaagca181ctgggctaaa acgtggaaac tgggcattga caaagtacag cggACCESSION No. AA826237ORIGIN1aaagatgaga accagaatgc ttatatttta ttagtatcca agactgggga gagggatggg61gtgggagaga tcaagaattg gggagcagat gggaggcgct acctcactca ggagacacga121gttcttatcc aagttcaagg tgaaagaagt gagggcagga agagaaatct ccctgctagc181aacagcgact cagggagaaa ctctgggccc atagctagct ggaggcaggg tgacattgct241cccaccaatg ggccatcttc ttagctacac ctttgtagct gtggtgccag gcagaagaac301cacctggaaa ctgagctaag gcaggttcct tcttccaaca gaagacacag ctgggcaggg361actgtgcaga ctcaacaggg ccaggccagc tagtggcang tcagtgttca tgtctctcac421cagtgcctgg agggtcccca gccaaggaaa gaactggtca gttcctgcACCESSION No. AA007421ORIGIN1gtttgtagca gttccaaaaa gaaagcagaa ctcatttagc aattgtgata aaagaaggaa61aaatgcatat gttttaaaag tcattaacgc atcgtgaaag cgctcccaat caacctcatt121ccctaggatt ttcagctaac taacaatagt gtctttttaa tttgatgtca tgaaaatctg181gtcacagcaa acacaatgtt ttctaaagca gatctggcct ccgagggagg aaagctctcc241agggcctcca gtgccttgtt tccatggtaa cgacacaggt caatagctga agtcacacct301ttgccagctt tgattctttc tcgcaactgg gagtctgagg caagaggatc acttgagccc361aggagtggga ggctgcagta agctatgatt gtgacactgc actccagcct gagcgacaga421gcgagaccct atctcttagc atagtccaat cttccttttt cttgagACCESSION No. AA478952ORIGIN1tttcccagcc ctcaggccac tttattgctc aagagtggtc agtctggggt atctgcatgc61ctgaactcca tgatgatgtc gcctgtgtcg gggtgaaact ccactgcata gctgacagtc121cgtgggccac ccagcagtgc tctgggatct ggggcagggc tgaagaagta gacggcctgc181ttgcagtggg ggttccagca gcagcccccc tcgggatctg caggctccag gaggccagtg241ctgagcgtgc actccggggt caggtggtac tccatccata gcaccgctgc gtggctctgc301acgggccttc tgagctccac ggtgccctcg gcacacaggg gctgcagggg caACCESSION No. AA885096ORIGIN1gtctgtgact cttggttagg gcaaatttca aatccattat aatacataca ttgcagcaac61actgagtttc ttataatagg tactatccaa agctttcttt tttttacatg tatcacttaa121tcctcacaac cacctgagga ttaataccat ttacctgttt tacagataag gaaaacaatc181atttttcaat tatgactatg cccccaaaca ctggtttgga tggagccttc actggtatag241agaatgacct tcttccctta gactagactc tggctataat aaaggatggt ttaatcatcc301cctgaagcaa tgcataagat aatctgcaat gtatcttcac atactgtacc ttatttgata361ggcaagagac ccataaagga agctgagcat ggattatcag cttcatcaca aatctgaaga421aactgacatt tatgttatgt tgccttaccc aagttgggac atcagagcag caacACCESSION No. H29032ORIGIN1tttttttttt tctataaatc tctaatgtta tttaggtttt ttaaggutt ggaagtaaca61gagggataca tacagcaaga tccacttaca tagttttaaa acatgcaaaa caagattata121tatcgtccat atgtaattat atctgtggta aaatataaag atatgcattt tggggacata181gtcaccagat tattagtagc tcaaggaaag gcaggaggaa gagtgctctg ggtgggggga241ggttcacagg gtgcttggac tgtacctatg atttcttcaa ataaaaattt caagcaagta301taaaatatgg gatataggaa tgtaaaggat ttgggcaaag ctgggctggg tgggtatcca361atgttcctta tcaccatctc tgtacttctc tgantgcttt aaataggtca caatcnttgt421aagACCESSION No. R10545ORIGIN1tagaatgaat tgcagaggaa agttttatga atatggtgat gagttagtaa aagtggccat61tattgggctt attctctgct ttatagttgt gaaatganga gtaaaancaa ttngtttgac121tattttaaaa ttatattaga ccttaagctn ttttagcaag cACCESSION No. AA448641ORIGIN1agccttagga atggttttta ttcacttgaa cactgtacaa atattacaat ttccttttgc61tgcaaaaagt ataaaaataa tctttatata ggaatccatt cgttactgta aatctttcta121aatctctgca aatggcccta aatgagggta aatgaaaaag ccgaaatgaa gagagggtta181tggggcagca ggaggtgggg ccaatcatca gggctggacc acccagactc ctccccagag241acctctgttc cttcttggta gccgccccca ccacctgcag gttctagggc taaaggccca301gcagaagtgg gcacgtgaga gggccaggag gagctggagg gtcagggggt gggggatagc361gaaggaagct agaagtggtg ctggcatgtg cccagttcca ccccaccaACCESSION No. R38266ORIGIN1tttttttttt atcttttaaa tgggatttat ttatgtttac ataaaaggta gcaaatgtta61cataagttgt ttccttaaga acatttattt tgtacaatca cattgttatc aagcaagact121tatggaaaat ttcctgggtc cacaacactg aactttgaaa ctactgtagc attctctttt181ccaagtttaa acatgacttt gtgcactgaa gaagtatggc ttcgcattgc acagtgggtc241acatgtgaca acctgacacc aagcgagaag ccttttgatg aaggaatgtt ttatcttttg301ttgaggttac caaaatgggg actttcatgt gtggtggatt atccaaaccc catanttttt361ttttncggtt ccatttctgg cttccaattn aaattaaccc ggtttaaact aggcnggttt421nggccaatgn taACCESSION No. H17543ORIGIN1tttttttttt tttaacctct tgctcatttt tattccagaa cctaggaaga actagtacac61tgaaggcatt tgatgtttgt tatgaaaagg aaacaacaaa aaaatcaagt tcaggctggg121catggtgcct catacccgta atcccaagca ctttgggagg ctgaggcagg agggatgctt181gagcccaggg agtttgagat cagcctaggc cacatattca gaccccattg ctaccaaaaa241atttttaaat taaaaaatgg ctaggcatgg tgggcataca actgtaattc aagctacttg301aggaggctga ggtggggagg atcacttgaa cccggggggt tgagggccac agcgagctgt361gattcacaac actacactcc accctggggc gacgaagcaa gatttcgttt tcaaaaaaca421atttttgttt caantcccat cttcaccnta aaaacctngc tacattcccc aggggaaaac481caattttcaACCESSION No. T81317ORIGIN1taaagnnatg aggtcttgct ctgtcaccca ggctggagtg cagtggcaat tgtccctcct61cagtaagtgc aagccaccat accaggccct ttgaacatat tttaaatggc tgatttaaag121tctttgccta atactaaagt ctaacatttg ggcttcctca gggaacattt tctaatttac181tgctttctct cctatgtgtg gaccatactt aagtggtttt ttgcatgctt tgtaataaca241gtctcttgaa aactaaacat tttaaataag gtaatgtgac aactcgnaaa aatcaggatt301cttcccctac cagggnattt gttgttatta ctgtttactg ttggttactg gtttattgtt361gttnctntta ggtgactttc ctggaactaa ttatctaana tattaACCESSION No. AA453790ORIGIN1aacaaatata tttagatata tttaaaagaa ttaaaaaaaa catttcacaa aacatttgtt61gccataggaa ttatttttag caataaatgc ccacatcaaa atttaaacat ttttcaaagt121atgattatct gtactaagta atgcaacaaa ttatgtaaac agagtcagat acatttccct181gtaggagtca cttccttccc gggattaaag ctgtcccaga catctttcca ggggaccaat241taagaaactg ctattttcag agcaacagaa ataaaagctt ttatttgttc atttgaatat301aaaacaggcg ttatcacaga tgtacaaagc gtactggtgg tttaacatac aagaaggttg361ctgtcctttg cacataaaaa ttttgtttga aactgtggct ggttgagtac atgagttACCESSION No. R22340ORIGIN1ttttttaaca taaaggtttt attgaataaa tacatgcact gtcacgtgaa attagttgaa61cagaaaggag gttctctact ttttaacccc catcccccac cgctgttctc tatttgcagt121ggggggtcca gctggaggtg gaataaatgc ggcaaccaca ganaaaacac acagctacac181acaggcctgc atttggctta tgtgcctgaa aaagaagggc cgacctcttg ataaagaatg241tctgtaaaag gaattcttac cgtgcagaat atattatcat gggcnantac agttacaagg301ctgcttctat tttatttatt ttttgagacg gagttcacct ctgttgccca gggtgggagt361gcagtggtgc gatcttgggc tcactggcaa cctccgcctc ctgggttcaa gcanttACCESSION No. AA987675ORIGIN1gggtagatag ctagaagtga tagtgctagg tcatatggta aatatatctt caacatttta61agatactgcc aaactggttt ccaacgtgac tgcatgtccc atcaacaatg cgtgagtgtt121ttagtttttc cacgtcatta tttcacttcc cccaggtgtt actgtccttt tttattatag181cattctagtg ggtaagaagt ggtgtctcac tgtagttttg atttgcatgt ccctgctgac241tgatgatgct gaccatcttt tcatgtattt tattgtctat tcctacacct ttttgatgaa301atggttattc aaatattttg cctattttaa aaatggggta attatcattt tgttgcgtag361ttgtaagtgt atttcatatt ctggatatga gtcctgtatt aaatatatga tttgaatttt421taaaaaaaaa aaaaaaacct cgtACCESSION No. N51543ORIGIN1acgattaatg ttttattatt catattttga caaagatagc atattatatt ccaggacatg61gtagttacca tgtggggaaa cctatcaaag catttttaat gactgcttag aataactgta121gaaagtactt tctcaatgat ttttgtatgc aagaaaaaaa atacctgaaa gtaaccaaaa181gtttcagact ggaaaatatg ccaggaagat tttcttctct cattctcagg tgaggttata241atccagtttt agcaaatgtt tgacaattta aaatactttt gaaaactgga gatttaaaaa301atgtaaacaa ttggtaggca cagcaaaatc gtagttttcc cttctgatat tatacatttt361ggcatctctc tacagttatg attaaccatt aaatnaaggg nagctaaaac gttccaaaaa421taggttttac caacattcan tttttaaaat tttccattca agctggtaat ccttttgggt481ttccACCESSION No. N74527ORIGIN1aaacgtggca cagtgtgtgt agtgtatgtg actactatca tttgtgtaag agaaagaaaa61gtttactatc agagactgta tctggaggga taaacagact ggcaagggtt gcctctggna121agaaaccggg gaatagagag cgggagtaga aagactgtat tagctgggtg tggcagcaca181cactgtaggc ccagctactc cagaggctga ggggaagact tgctcaagcc caggagttca241ggtccagcct gggcaacaca gcaagactaa aaaaaaacaa ctttcttttc caagaatacc301ctttttgtaa cttttgaatt ccgtattttt taatggtcta tggtctacaa acactcatgt361gcaaacacat tacacgcaga ataagggatc acctgcacga agctatgaac tatttcctca421tcccttctag ccccttccta gaggcgaacc ctccgccccc aaccccaggc actatctgtc481ctgcttgcac ccaACCESSION No. AA121778ORIGIN1tttctgtcaa gctgttcttt atttcangga gagggcaggg gcagagcttt acaggagtag61agattttgta tgctattgaa ggtaaattgg tatcagttta aattagattg ttttaagtgt121aggatgttaa ctataatccc catagcaacc acaaataaaa catctaacaa atatacacaa181aggggagtgg aaagagaatc agactagttc actacaaaaa aacagaaaag aaggccataa241agaggaaatg aggggccaaa aaagtatatg acatatagaa gaagtgttaa atggtagaag301aaagtccttc cttaattact ttaaatgcaa atggattaaa ttttccaatc caaaaggcag361aaattggcag aatggacaga naaaacaana catnaacatg atagtgatat gcctgtcACCESSION No. AA258031ORIGIN1ggggccccgt gatctcaacg gtcctgccct cggtctccct cttcccccgc cccgccctgg61gccaggtgtt cgaatcccga ctccagaact ggcggcgtcc cagtcccgcg ggcgtggagc121gctggaggac ccgccctcgg gctcatggcg gccccggtcc gcatgggccg gaagcgcctg181ctgcctgcct gtcccaaccc gctcttcgtt cgctggctga ccgagtggcg ggacgaggcg241acccgcagca ggcaccgcac gcgcttcgta tttcagaagg cgctgcgttc cctccgacgg301tacccactgc cgctgcgcac gggaaggaag ctaagatcct acagcacttc ggagacgggc361tctgctggat gctggacgag cggctgcagc ggcaccgaac atcgggcggt gaccatgccc421cggactACCESSION No. AA702422ORIGIN1aaatgtcttt aattgctgaa tgcctctttg gctaatattt ggaagatcat tatttagtcc61tacaacagac gcattgttcc actttcccat cattttgttt gcaaaccgct aaaagtctta121tttcctcatc tctttgacac attaccaaag tggaccctat gctgtaatca cacaggataa181tgttggaaag tatgaatatc taaattattt tttaaaggta ttattttttt ccttctgttt241tcaaatcatt tctgacagtt tctaaagaca tggtcacagc tgcctgaagc atgtcttctt301cactcatagc atcacctaga tcactcccaa gtgctcctga actggtggct ggcctttcac361atggatgtga actctgtcct gataggtccc cctgctgctg ctgctgctgc tgctgctgct421gctgctgctg ctgttgctgc ttttgctgct gtttttcaaa gtaggcttct cgtctcttcc481gaagctcttc tgaagtaaga tttgtacctg atgtctgtgt catatcttga gaaatgtttc541gACCESSION No. T64924ORIGIN1tgagacggan ttgctctgtc gcttaggctg gagagagact ctgtctcaaa aataaaaata61aaaataaaat aggagtaatt cacgaggaaa agattacata ggctgctttc ctgcttttct121tatccacagg cagttctttg caatgactat ttaaaaacta aaacaacatc acaagtcatg181aagtttgtgc tacccctgaa cttgacaaat tgtctgattc aagtgggcaa agcacaatga241ttggatgcat ctgaacagaa cctcctctgg aatgggggcc tcactagagt gagctcttca301tgagccttgc caccaggggc aggggattat tctgttattt tggcctgttg tagccaagtc361tgcaccccta ggcacccaaa acaaactggg gngagttggACCESSION No. R42984ORIGIN1tttttttttt tttttggaaa acactgttta tttgaaaaca atgagacctc aaatatgaaa61tatagttaac aatgacattg acactgttgc tagcactttc ccctaaacca cccgtaagtc121ttggacgcat gtgcatgcag cacacacaca cacacacaaa aaccaaaaac aaagccaaaa181aaaaaaaant cccaaacaca acattccatg nttgttcatt gaactcctga tgccgggagn241acaggactgt taaaagattt tgtctcccac attatctctg ggagtggggc acaaagcACCESSION No. R59360ORIGIN1ttttttttgg ttttattttc tcctgaagct gaaaatgttt cacccatata aatgtggcat61tttagactct agctataaac ctcatcgacc agtatgtttt cagagttgtt cacaacaaaa121tattattcgt ttctaaaatc agttttcact ttttggtgat agtattccag gctggactgc181ttgaatttta gatgcagaga tcattttata tatatctgtc aatgtaatac agaaaaatta241catgtgaatt gtttatgtgc cccctctacg tagggacaca gtatcaatca ctcaataagg301cactgtaaca tcaggtgggt gtttggggat aaataacctc ttcggggttt ctttcaatcc361cactaccata tggctACCESSION No. R63816ORIGIN1aagtcannga tntttactta atttctttca ttgtatactt gtatctcatt ttctcttaac61actgaaaatc ctgacttcta aagaaatgta actacttgtt ttcttacaac atagtattct121agatacaata ggttcaaaat aacaccagta ttaccattaa caatgagact actaaatgca181ttttcacagt gcactaaaat ctcaggaatt cactggcaat ataattcatc catgtaataa241aaaaccactt ggtaactcca aaactattca aataaaangg taataacaaa tttaaaaatg301gcattttgng ggtttcttcg gaattttttc accctttata ttcccccaaa gggccttctc361ctattaattg nggaggggcc ttgggnattg gACCESSION No. T49061ORIGIN1ggaccaaaga actttatatt tattttaaat atcaaagtaa cacaaagaac tagttcaata61tacagtacac ttcctactct tcacagagaa ctgaaatttt ctataaagac atttatactt121aggaaacatc agacaaccaa agtatgtata aaactcacaa gatattttac acacagttca181caataattaa ttctgatatt ttaggntttt tctgtcattg cttttaaagc atccttaatt241taaaaacaaa aattattatt tgaggactgg aaaacaggtg gcaaaggcat ttctactttt301aattatacac tggtaaatcc ccccttaatc caaaacattt tacttncaca tACCESSION No. AA016210ORIGIN1cacagcaatt catctttgct tttattaata atttcaacgt atgttttgag cactttacaa61tgtaggaaat gctttcatag acattatttc ctatgattct cacaaaacct tcactgaaaa121aaaagacttc aaggtcactt gccctatgtt tataaaataa tccgctttaa ataagcagat181aggagtccaa aaattcttac aatcataaga aaaaaaaagt ctaaccagta cttaattatt241tcttgtcatg attactttgt tttaacgcca ctgtttcctt gcttccccca ttttcttcag301ataagtttac tccttttggc ttgtcctgca tccttttctg acagctgccc tgtgtacacc361tgccttaaac atctatcctt ctactctgga atagactaag ccaaaagcaa ttaagaaata421tttcattcta aagaaaacag aattttagtc caaaacccaa atACCESSION No. AA682585ORIGIN1cctgtgggct atattttcct gtatgttttg tatttttttg ttggaaactg aacattccaa61gttttacact ggggaagctc tggaaactga attattttac tcctccagga ttgtttattt121ttaaaatttt gctggcttat gataaagggt atttcgagga aacagataaa gggatgtata181gggcgaggta tgggggaagg ggtgcagagc ttccatgccc tccgtaggtg caccactctc241caggaacctg caggtgttca gctatgtgga ggctccctga atgcggtcct cttgggtttt301tatggaagct tcataatgtc agcattcctt cccccaaggt atagggcaag actctctctg361gggaaggtct taggaccaca atcagaaaag tgggcagaca ttagagtcct gccttggggc421agatgaaagg agggcaggag aaggtcagag aaattgtttt tcttgagACCESSION No. AA705040ORIGIN1gtagagtcgc ggtctcactg tgttgcccag actcgtctca aaaaactcct gggctcaagc61aatcctcctg cctcagcctc ccaaagtgct gggagtctag gggtgagcca tcatgcccag121ccaagcctga ttttaaatca ggtctctgcc actagcagct gagagctcct cactgataaa181tcctttgcag ctggaagtat tcaatggtat ccagtatatt cccaatggct cattcctctt241ggacagagaa actcaagtta aatgaactct tttggctgtt tttctccctc ccctttgttt301cctccctctc ccttgcctgt gtctctctgt ccactctctc aggcccttcACCESSION No. AA909959ORIGIN1ttttaatggg caaaagaaca agttgcagtc aatggctgca gaggggtgtc tggggtccaa61tgtgggctgc actttgtggg tactgaggaa atgggaagat gctgcttcta ggtcagctgg121tgggttggag gttgggggct gtaattagca gcagccttag aactgggatg cctttcaatc181cctcctggcc ccttatctct gtggggcagt cacaggacat catctgtttt attcaaagtt241gggacttgca gcaggagacc ctgtcctgca tggagtaggg gtcctctgtt gacaaacttc301ttggtttcca gctcttcccc atctgcagca ggcctctgga taACCESSION No. AI240881ORIGIN1tcggttaaga tttttattat tccagagaaa aattagaatg tatcggtaaa agaaatagga61atgcatattt caactcactg tcacaaacag gtgttttatt atcccaaatg acagtgttgc121ctgagatgat gcatgtggca gacgaggaac caatgagtcg gtatccttta ggacaagaat181atttaatttg ggatccgaac tggatgtctt tgatcacatg tgccatgcca ttcacaggat241ctggaggatt acgacatgat ttacgtttgc acttgtcctt agcacttgtc cagactgagt301tttttaggca gatgatagaa aacggtcttc cggaataacc agggcggcat tcatagttca361gatatgtccc aatgggaaac tcagagtcat cagttaggtt ggtaggcctg gcaaatggaa421gcccattccg gacattgcat tgaACCESSION No. AA133215ORIGIN1caagaacatc ccttttaatc acaaaccact catccacaaa tgtggctatg gggtaagcag61tctaggctgg gaccctttcc agaggtaagt caaggtcacg tccctgcccc cttcctaggg121tggcggtggc tccagccagg ggggcttcca ggttaatacc agagcctcgg ctactctgga181ctcctgtgag ctcttcttgg ctggaagaag gggggcattg tgggcctgct ctgtcccaag241gctccagaag ctgcccctac ccaggcctgc ctgcACCESSION No. AA699408ORIGIN1taacagtctt aatattcatg tatttattct cagaacatac aaacttatct tctcagagaa61tagaaaacag agatttcact cagtgacaaa gatggacaca gccagttcac cgtgtccccc121catctactta gaaaatcccc tgggggaggg gatgcctaga gcatacagca ccccttggtg181gccggctgtg cacaggtcta aagactctca acttccttta ccatccaaaa aggaaaacag241ctgtccagat gacagtaaga ttccactgtc tgtaatcctc atggtgccag gtctcctggg301gcatctaggg caatgatgct actgcagttt atgcagttac acagtcaagt ctgtgccaaa361ggaggtccca tccggcggcc aggtttctgtACCESSION No. AA910771ORIGIN1ttttgttgta gaaatatatt tattaacata agcagttcac aatttactgt aagaaaaaaa61gcaagctaca aaacagtgat tccatgttta tattaaaata aacatacaca aattaaaaat121ttccttagat atccatttaa tctctgggat cataagcaat gtttaggtat tttttgctca181tttattgcct aggttttaca caatgagcat atatgttaat tgtgtaattt aaaattatgg241aattaagtgc aagagttcct aaccaccttt tacaaaactg ttatgagaaa atacattcta301gattcaaaca aaaactaagc aatatatccc ttattctaac agctctaaaa tctgttcttc361tcattatact cccacACCESSION No. AI362799ORIGIN1tttttttttt tttttttgca agggctgcgc ggcattttat tttctgaacc ccccacagca61ggggcggcca gtcctgctgc aggcagagtt tcagtcttcg gagtttgacc ttctggccca121aggtcatcac agccacaggc ggaggctctg gggaaaggtc cagttcctgg gatgctggcc181cctaatgatg ggcccatctt tccagtgccg cccttccctc ccgcctggca caggagttct241ggagccacgg tcctgagtct acagaacagc ccggtcagcc tcgtcccgcg gtgcaagcga301ggcctggcct ccctccctgc ctgtccttgg cccggccaca tcactccctg cgtttcttct361tcttctccgg ctcctggaca ttggccgcct ttgctcgggc actggtcagg ggccgaggtg421tcctccttct ttggcgagcc cctttttggc cacgggccctACCESSION No. H51549ORIGIN1atacaacatc tttatttggc attgganatc ctgacatttg tncattacag ttccttaaaa61aacaaaccaa aaaatcagaa caaattaatc aaaaataaag atccaatggc tctatttaca121tatngcaaag acagcccagg natcttccnt gcacacacac accccgcccc gatacagtta181aggggttaat aagctttggg gagcgcagga ggcaggttcc acagttcatc aatcccaagn241cacccccatg aggtaggggt gcctcacaca gccagacggn tatcaagagt atgattggta301gctttttcct cACCESSION No. R06568ORIGIN1ctgtcctgat tagaattaat tttcataaag agaacaagaa tcttgactgg ttcacccttc61aattccttgt gcccgcaaca gtgaccggca catggaaagc attcagggaa taaaagcaca121atggaaaatt aaaacatact cactgcatgc ctgccaccta taggaaccaa attaaatcac181tgccaatatg gcatgggggg aaaaccttcc catttttctg ggaataatgt ttacaaaggg241tgggaaaata aggtggcaca ttcacctggg gtggggcatt ttaatttaaa cgctngttga301ccccagtngg ttgttacntt tttcaggtgg aattaACCESSION No. AA001604ORIGIN1cttatgaata atgttagaaa tggaacatga tgttttaaat gtatacataa accttccaat61taattatcag gtgatccagt agtagacctg tgacctctga aggctcctgc ttctcatccc121ttcccttctg ctgtgatttg ttgtcttccc tctgctcatt ccccttgtgt ctgtttcttc181catcctctcc ccatgctccc tctgttgtca tttcccctta ctctccactg cacccagcct241ctgttcataa tttttactgc aattccgatg attgaattat aaactggaag ggagcaggga301tattgatctt catgtagttg gacatgtact agactcacgg agaacaagga ctgggttgta361ggcacaatgc tgtgtgggtt ttgggtaaat ctaactcaca ctcaacttga ttttgttttc421cACCESSION No. AA132065ORIGIN1gagacacagt acaacagtct ttaatgtata tataaatatg cctacataac agagtttgat61aagagaagtt ttggctatat acaactctgc atgtaatcaa actctagaac atcaaatgca121actccactgc atagctgttt tgacagagca acagttaagc ataaaatagc tttgcacctt181attattttgg agcaaaataa aaaataacca ccacaaaaaa aatctctaca ataatttaaa241ctaaaaatgt tgttgaggat agggtaaaca acaaaaaaga aaataatttg atccatatgt301gatatttggc tgaagattaa cagtgttaag tctaaccaac agcgagataa ttttaatttt361cccaagcatc ttnctaccgg tttattagcc atatttggat attaagggga agggcatttn421gccctttacc aaaaccnACCESSION No. AA490493ORIGIN1tctttattga cttattgtaa ttttttggca tacaaattac ttaagtatat ttacaattct61tacataatgt acattttaga agataatgta ctttgctcca tttacaatga caaactactg121taaaactaca ttcatgaatt agatacaaat cctctacata ctaataaaaa gtaaatggac181tgttggttat acattcttta aaatatacct tttcacaggt agcaagaaat agtacatgta241ataagtcttt atgactggaa tgaACCESSION No. AA633845ORIGIN1gtttttaaaa gtcagggttt tttgttgttg cttgtgtgtt ttataattaa catagtttat61ttttaatact ggcatccaag aatcctggtt tactcaggtg cagaaagact ctctaactaa121gcagccaaaa aaatttttgg tatgcaagtt ttatcatttt ttaatttgca tatgacttga181acgtgtcttc aagtataggt ctacataata actttttaag aaaattataa agctcaatac241aataaatcta atacataaat gctgcttgta agtcaaatat ttaagagact ataaaaatgg301gtaattttgt gataaaattt agaatcattt gacaagagat caatgaattgACCESSION No. A1261561ORIGIN1cactgttaaa aatacattta tcattaaaat atattacaca tggagacagg atgcatcata61tacagtttgg aagacttgct ggcccagaaa atcccacttg tttcaccgaa cactcatttt121ttcagggatt ttacatttta tttttagaga cggggtctcc ctctctcacc cgggctggcg181tacagtgatg tggtcatagg tcactgcagc ctcaaactcc tgtgctcaag tgagccaccc241acgtcagcct cccaagtaac tgggaccaca ggcacgcatc accacgccca gccaattttt301taaaaatgtt tttgtagaga gggggtctcc ccgtgtACCESSION No. H81024ORIGIN1agcttcagcc tttattaaac aaaggaggag gtagaaaaca gataagggaa cagttaggga61tcccttcttt cccctataca tacacagaca tacaaacaca cgcacccgag tgaatgacag121ggaccatcag gcgacagatt gaagggcaga gggaggcagc accctccgag agttggcccg181gacccaaggg tgggctgaga cctgggccag gggcagccgt tccgaggggt tntgcctgag241cagtttggag atgaggtcct gggctcccgt ggggcacaga agcggggaac tttaggtcca301ccttggacga tggcggACCESSION No. N75004ORIGIN1tcaagtcata agataaagtt taatcatttg atcatgttaa aagacacaaa acacagccaa61tctaaccaaa tttcaggcat gcatttacat aaatatatta aattaagaaa agaaattgta121cacttaaacg tccttttcac ctagaaatca ttaaatccac agatcaacaa taaaaccaat181tctctgcatt taccacttca agatacaatt gttctatttt aaagataaca caaactncac241tagtctggtt aggaatttat ntgcattata catatattatACCESSION No. W96216ORIGIN1tctcaggagg tagaagcttt attatgacat cttcaaaaga caatcaaatc aatagacatt61tgctgagcac ctgctgtgtg caagcccgtg tagacagtag ggtccagtgt cccacgcatg121gctctcgaat ccccggggag aaaaatcaca tcnggggtca gggagttttg cgtggctgag181aacaaagtgg gtttctgaac atcaaagtgc aattcgcttt acggggcaaa ctccgangcc241cagccccgcg tngggaagcc gcagcngggc gggcccgctt cctggggctn gcggccgggg301tttctctaag ccgcacgcnt tgcgtggtgt tgcggggcct ctcaagcaag cccggaagca361gcatccttga gctccggttg ttggagcgct gggacctctg gctgccgccc ccgcagcagc421agcaaccact actccgctgt cACCESSION No. AA045793ORIGIN1caaggtatag ctaattttat tattatcaaa caaaactagt agatataact tccaggaaat61aagttacata aatataacag aataaattca ttttcttaag tttcaaatta aagatgatta121agaaatacag ctttatgtaa agtttctgct ttttctcaac cacgcctaaa gaggaaagaa181ctggcagcag gaacacttgc tcctaggaaa caaatacaac aaaattataa ttaaaaagat241cttcaagcta tcaaaatttg tgagagaagg atggtaagaa tgcagtagaa attaccanat301gacaaacaaa atcctatcag ttttcaggtt ggtcaaaaag taacttccat gaatatagcc361tgtggatccg gccatACCESSION No. AA284172ORIGIN1gtgttaaagt tggatggatt tattttttta aaggcccagt acaaaaaaat ggttgaggaa61agtgactctt caacaaaata tacacctgta gaaaaaaatc cctaatatac tgatatttaa121ttgaacggaa agtactaaag agaacatact ttaatatcta ggcacaattg gtcaggtact181aattataatt tctgttctca tttaaaagtt taaaccaatt cttcaactgg actgatgtgt241gtgagtctaa tacagagaag gcacctctct catctctcac tctccttaag gaccttttga301gagaaactct ttgtaacact ttaagggaca cagacaatgc actatatcta agtatagata361tagttattta acatacACCESSION No. AA411324ORIGIN1tttttttttt tcccaaacaa tacatatcag attttatcca ttttgttttc tacatgttct61ttgtgactca agtttgacat tagcatttgc accccaaatg agttccccta caaataaaat121ttgttcatgt tgacacaaag aacacaaagc aagtatagat ccctcaggaa gttgtcacaa181ctcttgataa gattaactcc accactatca tcactttttg ctttgtcccc tagtttgaag241cctgctggct tttataattc aatgagaatg actccacact cttctccaaa gcgcccatta301tttttagttt ttcggtgcgc gactcaacat aaagacctgt ggctcttatg agctgcctgt361ttttaaatgg tgcagtagtt tcagtttcca tttaataagt tcccagataa caaatggaga421atgggaagaa tcttctcaag gtcacagtga aggtaaaaat aaattatctc catcactgag481aggctACCESSION No. AA448261ORIGIN1tttccagaaa aggatatttt ttttattcaa gtaactgcaa ataggaaacc agagagggag61ccccaggctg ggacaaatca tggctacccc tccccaacag aacaggggga ggaggtggcc121cctacaccct ttatggtcga ttcgggcccc cttgctcact ctgctgcagc atcctagggg181cagggccagc cttccctggg actggggtag tcggtcaccc agcctgccat gccccagccc241ctcttcccca caaagagtat cttgggggag gggatcgtgg gcagaacagg aggcaatgag301gatgaacatt tggcgctggt agcagcagca atgacggatt gtcgaagaat ggaacattga361acaACCESSION No. AA479952ORIGIN1aacagtctgg ctgttgtttg aattaaactc ttaaacagga tgtttagtta gagggtaatt61gttgagtaat gatgcataca acagcatact tccctttctt gctgggggtg cagcttttca121gttttcttgt tttactttga cagtgcaagg ggaactgaaa ataatttcca ttgtattatt181tatcttagtt cagctgaggg ctttatgaga cagtggatgg ggaggcagta agacggtgat241gagataaaat gtgtgtgttg cactgactgt ctataaagtt atcctttctt catgaaaaag301tagcatttaa atctggatga gtttataaag gattacaaaa tgctgattta tagagtaaac361tttaaaatat taaagactaa agactaaaag aagagtaata atgaagtaat gtagACCESSION No. AA485752ORIGIN1ttcggcagca actcctttcc tttatttctt ccccttgtaa agggaaattc aagttcagca61gcattccttt cctgccccaa gtcctcaacc agacaagagg ctgcaggcac caaatcttgg121gctggataat ggcaaaggcc tcagaagctc acctccagct ctgagcttca acagctgttt181gtaccagtga gtcagcatta aatccaccag aaaagaacag caccacccaa agactggggg241gcagctgggc ctgaagctgt agggtaaatc agaggcaggc ttctgagtga tgagagtcct301gagacaACCESSION No. AA504266ORIGIN1tttttttttt tttatatata tatataattt tatttaaaat ttagatccct attcccacac61tctaataagc tgtataattt ttgtttagaa tttttctgca aacatactac aataagcttc121ttttatttgg agacaaaata cagtggcatt actggaagga atatcacaac attacatttt181tatcttaaag gacaagcaaa ctttcagggt tgataatggg ataagcatgt ttgagactgg241ttaccttctg gcagttcact gcatctggat atttctgaaa agtatagaga agctcttgga301ttttaaaaat atcttaaaat acttttagat gaaaaaattg taaaagttct gcttataagt361ttacttttct ccacaattac aatatttaaa acaaagtttt gttgattgac gttttaagca421tttaaattta gaatgctaaa aacaattcta tcctacactt tcttcagggt aggggaataa481atacatcctt aacattgttt tctggatgta aacagaaatc cagcagaggt catcattatt541tagtacaacc agtaaataaa tgtaagagaa tACCESSION No. AA630376ORIGIN1agcttggcaa acctttttta ttttgtgata aaaatgcttt catataaatt tcatcttaac61tacctttaga atgaaacgga aaagtaaaaa caaagtgtgc attttcctta ctacgtttag121tcaggaatat gcggtcattt tattggttac tgggtttctc atacaaacag atataatatc181acttttaaga gaaatgtaca caaggaagta accatagtac cacttattag tgggggcctc241tgggtacata aatgtgtcct cccaaatagt catcatacat tcaatggtat tACCESSION No. AA634261ORIGIN1atagtgaaaa tatactttat tttttaatac aatagctgcc agcaatatac tggtgctgat61gttccaaaga taaaagaaaa tacatgcatt ctataataag ctttcatttg cctgttcaag121aaattataaa gaaaatactc caattctgtt caacattacg gcttgaggag ttgaaatttt181tccatgataa aaatatactt tgtgtggccc aaaccttgac tatttataaa ggatggagtt241tttaaaagcc cacatgtatc aataatggat gctcccctct ctttgaatta aatgcctaaa301ttcaaattaa tgcaagaaat tggtgaatca ttaaatgatg aaatttgtat caaaatgttc361atgaaaaaat acatttctat ttcctctaca tttttacttt gtagttattt tctaaatggg421tttaagggca cagaaataaa tgctatctac atgcaactct ggagagattc aaaacacaac481agaagttaac atgcctaaat cctagagttg atccatttag tgtaagaata aatgtcagaa541atcACCESSION No. AA701167ORIGIN1ggtagaggca aagtttcgct atgttgccca ggctggtgtc gaattccagg cctcaggtga61tcttcccacc ttggcctccc aaagtgctgg gattacaggc gtgaaccacc gtgccaaacc121tacattttta gatttattat ggtgttctga ttaacaataa agctaggtta ttagctgcct181gggaagagga ggaagtagat ttttacagtc acttttatag aaactgttaa attcacatga241gaaattccac cttacgagaa ttggctccct gacatgtctt tggactacct ctgtttctct301aagtttttgt ttttttctgg tgtctgaatt aagttggtga cagatttggg ggatatttga361gtagcacttt atctagagtt gcACCESSION No. AA703019ORIGIN1ggcatttcag taaatttttt taatgacttt aatgattctt atttaagaaa aagcccttaa61ataaatgcta ccaaggcagt aatatttgac catatgaacc agaccaaata ccctttaatt121ttagtatatt aacctctgct gtaaatgctc ttttaacatt gccacatgta caaatttgtc181tagaacttca cgacacaaaa gtgtgcaaat atgagtctaa gattgtgctg aaatagggaa241aggctaacac tgatgtgcaa agtaaaaaag aaagataacc gcttctgcaa caggtaataa301aacaaggaaa aaacgagtta ggtcctgcat gtgtctccac ttcattgctt ccatgtttga361aaaagggagt ctgttctttt gctaggccat gaggctggaa tccacttggc atactgtgtt421gagaggtcta agttcagtgg tgctctcagc agcagccggg aggACCESSION No. AA706041ORIGIN1cgctgagctg cttatttatt gaaaataaac gacggaaaag tctggccttg ctcctgtgca61agcttggagg cctgggtcgc cgctgtggac aagcgtctta gtgtcatgca gaccagaagg121cagctgctgt cccagggccg gggccacctc actgcctctg atggggactc ccagccccca181tggctccgct gtgccctggg caggggacgg gctgggggca ggggagggct ggagcccagg241aggcagcaca gcagccagaa agccgcacgc tgagcctgca cctatggttc cgggaggggc301ttgggccgtc acccaagtgt gatccctaag aacaggaggc ccagcaccct ggaaggaggc361gctggaaggc ggggcggtgg tggccccgtc aACCESSION No. AA773139ORIGIN1ccatgaacac agtagtgaga tattcctttt ccactcctac actatcttct gcttaaaacc61ctctgagggg tcccatctct ctcagggtga tgtctagact tcttctgagg ctagaccagg121tggtgcggcc ccatgtgcca cgcacccaag ccccctgcct cagtgtcccc catatcccac181accacagggg ggtggctgcg ttctgtatgg taggtggtgc tgaccactgg gcctctgcac241acgctgctct cagttccctg gccaactctc cttcaggcct cagcACCESSION No. AA776813ORIGIN1ttttgtagag ctgggatctc actatgttgc ccaaggtggt ctcaaactcc tggcctcaac61tgattctcag gcctcagctc cggaagtgct ggaatcacag gcaggagcac ggtaacccgg121gccccacagg ggtttggggt cACCESSION No. AA862465ORIGIN1tttatgctag gcaaggaggg atgattattt attagcttct acagattaga caatggggtg61ggggtgggct caaggtgaga tgattttttg ggtccaagtc tactcaagac aggcatccca121gtcttcggtc tccaaatcca cctcctgtct gtccccccac actgctcctc aggccttgtg181gatccattga ctgtgatttc tgtggttcag ctcccacatc aggcaggaag ggcagctact241gggtctgaga tcccacattg cctccaaccc ttgcttccta gctggcctcc cagggcacca301cgaggggctg ggccaggctg ctgtgctgca cgtggcagga gtagggggct gtgtcctgcg361ggggcactgc accaccaccc aggactggta agtgccattt ccattgtgaa gaacatctcc421cgtactcagg ctcctgcacc tcgcggcccg agtccagtgc acatcaattt ccctgggtag481aagtcgtagg ccagcacttc agtttcttct tttctcctgg gggctggtgg ctggtgacac541cacagaggga ggatctgccg gtccaggata tttttgctACCESSION No. AA977711ORIGIN1tttggcattg taattatgca gaagaaaatc tttattctta gggatcatgc tgggaactga61gggatgaagt atatgcatat tccaaatggt tcaggaaaaa tcctgtctat aaagcataca121tgataaaatg tcaacaataa gacaaactag aggaaggata tacaggtgct tactgtcaaa181tttcaaattt tctgtaggtt tgagagattc aagatgaaaa cttgggggaa aattatatat241tctgataata aaacagatgg gaaacaaaga gggcccataa gacagtcact gattaagatg301ctttctacat ggatgggcct catccttttg tccaaaggga ctacctggca tctgttccat361gttagtgaca gtgactcacc ccaggttgct gcacagatat gagaggcttt agatcatagc421acagtcACCESSION No. AI288845ORIGIN1tttttagatg ttttaaaata catttatttc atgtcgtttg tccccagggt ttggagtttg61atgttctgga ccaagcgtag gctctgagca aatgctacca gggctggaga atcagttctg121ccacttccta gttaagtgat cttagacaaa tttccgcgcc ttagttttct tctcagagaa181atgagactag tcctatccac actatggaca agtggtagga ggcgaaggag ctcacgtttg241taaagagcct tgcacggtgc ctgagacaaa ttcagtgctt agcaaatgtt agctcacctc301tcccttttct tcctgtatcc gattttgtat acaaatgtgt agaaaattta catgaaataa361tgcagaaagACCESSION No. H15267ORIGIN1tttttttttt ttacatgaag tagaactttt atttggaaag ttgaatttca tgtataatga61aaatattttc aaaccataca tagtcataag cataatacaa acaccaccta caatacaaac121acgttttata aagttctact atgaatatta atccaagcca aaagaaaaag gtaatcacgt181gaacctgttc tacatacctt tcatctcttt tgatgacgta atcgaacaat ttaaggtaca241aaacaangaa agctttgggc tgaaccctac ttatttcact ataggaacac taggatatat301actaccacag gtaaccaaac ccaatcccat tataattaat ttaacattgt tacatggatc361ctatcttaat ggnatgtaaa catACCESSION No. H18956ORIGIN1tttttttttt ttttttttac atgtaagaag tggttttatt ccaggngtgt gtttcataaa61gacgaggtcc tcaaggacag ctagtggcac atgctttggt caagaagagg aaaagcaaaa121acagaacagg gctgcgttgc cacaaaggac cggctgataa gtgcagagcc tgatctgacc181acagcaaagg acagagagac cctcttgaag gccctctggt cagcagtcct cttacattca241acaggcgcac ccggctcccc agccccaaag gtccatgccc gagtntggcc cgggcttcta301gtccatcctc tgggggagag gcctttgccc tggggcccag ttttgtccta aggtttnggc361aggganggtt tcccagatgg aacaggggga tttttagggn tgcacttggg tttncggaag421gaaacntcac gacagaggga caggcaaagc ttggccntgg gACCESSION No. H73608ORIGIN1aaattttatt aattttattc aggaaagaca ttgactgtta agtttttttt tngggggggg61ggtgatgtct tgctattttt taaaaattat atccagacta tgaatttaat atttactacg121gctaatcaac tgctcatgtc agtaatcaaa gncagaaatg agccttatac gtacatctac181attaaacaca cacacacccc tttaaggggt gctcagtgta gnttctaatg tcagtctgtc241cattcaaccc agggcccaag gttgcatcac atcaccaagt tggaatcatg aagacagccc301agatttgact gacatgggca cagcagggct ccctcaccac agcccntggc accagttaac361tatttctngc tcgngccgaa ttnttgggcc tcgagggcaa ntttccctat tagtnagACCESSION No. H99544ORIGIN1gcgnccgccg cccccgcctg ggccgcgctc cccctctccc gctccctccc tccctgctcc61aactcctcct ccttctccat gcctctgttc ctcctgctct tacttgtcct gctcctgctg121ctcgaggacg ctggagccca gcaaggtgat ggatgtggac acactgtact aggccctgag181agtggaaccc ttacatccat aaactaccca cagacctatc ccaacagcac tgtttgtgaa241tgggagatcc gtgtaaagat tggganagag gagttcgcat caaatttggt gactttgaca301tttgaagatt ctgattcttg tcactntaat tacttgnaga atttataatg ggaattggga361gtcagcggaa cttgaaaata aggcaaaata cttggtaggt ctgggggtnt ggcaaaatACCESSION No. N45282ORIGIN1ctaggcataa cataaattgt tataattgat cagaatatct tgaatatatt tttacagata61actagtggtt tctactagca gattaaaacc aagagaaaat taaaagtaag ttcacattta121aaaaaaatta taagcaataa atacagcact acagccacca ctaattctat atacattgga181ttacatttaa acaaacactg cattccagaa tgaatatttt atgaataaat gcattggaaa241ttaactttag gaaataaaat gacaaattac gaatttagaa aattaaaata tgactttcac301aangtaatca cagtaaaatg cagatctaca ttttaaaagc tagaaatttc cccaaattta361tttttttgga cagccaagaa gnttgcctta aaaaACCESSION No. N48270ORIGIN1tttgcacctt gaaacaattt aataatgtat tacattatag tagcatcaca gcagcagtca61ataatgccac tttagacaaa aatcagtatt tccattatgc attctgtgta taagaattca121taaatcggta aaagtcattc taagaaaact tggcaaatac agctttggac tggaattggc181atttctttgt ctacttttcc ttcccctaga ttctttgttt taaactacag tattcatatt241ttaaaatgtt ttaaattatt ttaagacgtt aatatagcag ttacattttt gaatagttat301ttgaaagtga ctgtaagata aagttttaga gaatctatta atgggatagg gttgatttac361attttcacat ttttcctaaa aatcagcttt ggttttagaa ctgattggtt tttcattttg421ggaaACCESSION No. N59451ORIGIN1aaaatcactt caagaagcat ttattgagaa tctaagacaa acaccctata ttcaaagagc61ttacagttta tggaaaggcc agccaatcaa tatgcaatat ttaagtcttt tcattgaggc121aagtgttgat tttgagagca gagagatgat gatcgttttc gagctgagtt accaaggttg181gagcttacta aactcacaag ggcagtttca ggaaaggaaa ataccatctg caaaggtata241tggctcattc aggggctctc tgaattgtgg ctggagcaaa aggtttgaaa tcttttttct301tcccaagaag atgaaagagc tcctggagga cagaaactgc tttttattcc ctttgtatct361ctcacagcac ctggatactt aagactaaac tattctttca ctcatatggc ccattatcaa421tgtcagcatt gtaaggccct gatgggACCESSION No. N95226ORIGIN1tccctttctc cctgtttccc tcccttcttt ccttccttcc ttccttcctt ccttcttaga61attcactgaa gtatttccta ggtagccttt tacttactac tttaatcaaa gcttatcttt121gtgcccaatg tgtaaaaagt gaaaatgtct cttcgaaatt ctatattaca atatagacag181agaagttggg ccttgagggc ttgagtttca cttaaatact atacacatgt ggtatcacac241aaggtggagg gggagggaac aaacagaaac ataacaatta tttttattct gtctttacaa301aagaaagcct cttctctatg aaaaagtctt tttggcatct gctcccggaa acctgccccg361agaacacgtt ccccattgct ttgcaagcat ctctttttaa aagcacanca ctgtccccgg421gagtcacgta ggttggatta anctgtctta gttgaccaac gaagaancac tggatgagtt481ttccagggat gantggttgt ctggggtgga acatatagtc ctgtctacaa caaatgtaac541tcctgatatg ggacnatgaa cncagtgtgt gacccaggag tgnttgatct gtnaacantc601gcatgnaattACCESSION No. R37028ORIGIN1ttttttttct ctaagtgata atgatatccc agctagaata attgtgctct ccagaagcaa61ttaatctgat ttgcaagcac tgattttttc ttttgcaaaa actaataata ttagcctgac121caattatgaa ataattccta aatttacaaa ttcccaaatt tgtgctttca tggcttcctt181ctattttaaa tctatattat tttaaacaaa ttttccttaa gnaaaaatga cttaacttca241taaaaatcta cccatttatg gtaaataaaa cattaaccaa aaaccaaaat taaagggntt301actataaatg gnaacattta cattgctggn tattaaatcc ctttccttgg cattACCESSION No. R66605ORIGIN1ttttttatcc ttcttaannn ttattacatg ttttattatc ctgtccccag aggtgggttt61atccagaaac caagaaaaaa aatcaatcag aataaactca aaaaaaaaag gtagggggag121caaaaccatc aaccaccagg gcagccaggc catcagccca cctccacctc tggagggtcc181ccagagaccc acgcccgacg cagacccgga ggaggcatca gcaagggggc ccgggcagag241aatcggctat gtctttcatt atgaggaggc agggagagac gggcagagat atgtttgcta301gggtgantat atattttata ttaattaaat ccgtaagttt aattaaagta aataggtatt361tctctggaag tttttttaat ttctttcntt ttttatagtt tttttggttt tttgtggntt421tttttttttt ttttggggtt tACCESSION No. T51004ORIGIN1gcagctgttg tcttccaact cagcggcagg tttgctttcc ccacggacac tctggacctt61gtagctcctc aagcttccct gtctattgag cagataggaa gccgtgtcaa atatgtggca121ccttgaggaa atgcctagtg aatgacagta tgtcctattg tgctctaact ttatttcagc181cttatttctt ttctgaatat tatttttcat ttatcttcat ttccttacct attttctttt241cttctaaagt atgtatcttt gttagctcca tcatcctttt tgggaatgag gcaagtataa301aaataaggta aataaataag gaccccatcc ctaggtattt ttaaggaaac cacccttttg361cggggcacac ttggctacct tggggtcttt agggctctgg ggggctttng ggtgtncctc421tngggcaggt cctggctggc attggcctACCESSION No. T51316ORIGIN1ttcatccgct gcatgtggaa aactggcccg atacctcgca ctacgagttt ctcgccgaca61ctatgtggag cgattttgcc tacggtcgca atgccgtata cccggaagcn atcacggcaa121cgcanctngt cgcgttatcc cattgaacat tatgagaatc gcgatgtttc ggtcgatggt181gcggaaaagc gcggcntgct tcttacttgc cgcattgtgc cgccgattga ccgggaaaag241cgattcatgt tgatgttgcg tacatcttgg ggccttgcgt tgagggcgca ccgttcaggACCESSION No. T72535ORIGIN1atgacctctg caaagagaag gtcagctata ngtagggaga aaaggaagaa ggcaagaaaa61ggagactcga gatgagttta catccaagag aagcacagat gtttgtaatc tacctagaat121aatgtgaagt acctgtccag catgtatgct cagatcctcc attcattagc acaagctgaa181aacatgaact gcaaattcta caccagcatc ctttgcttcc tccatggcag tgggaggtag241caaggggagt ccaacacttc tccatgacgt angaaaggca gggaaaaata ctgntACCESSION No. W72103ORIGIN1gtttgtgaaa aggaacaaaa tgaanttgaa ttggacatgt gctttaagca ngccaacaga61caacacacca ctagagacac acatcaaaag caatcacagt gctatgatca aatgatgggt121acatgtgaac acatc


All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.


All nucleotide and/or amino acid sequences associated with accession numbers referred to or cited herein are incorporated by reference in their entirety.


It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application.

Claims
  • 1. A system for predicting clinical outcome for a patient diagnosed with cancer comprising a computing means; a user interface means that enables data entry, wherein said interface is coupled to said computing means, wherein said computing means is configured to perform microarray analysis and binary classification to generate a set of genes used in predicting clinical outcome.
  • 2. The system of claim 1, wherein the microarray analysis and is significance analysis of microarrays and the binary classification is support vector machine.
  • 3. The system of claim 1, wherein the computer is further configured to perform leave-one-out cross validation.
  • 4. The system of claim 1, wherein the computer comprises a database for storing the set of genes, said computer further configured to analyzing biological information from a patient against the set of genes to generate a predicted clinical outcome.
  • 5. The system of claim 1, wherein the patient is diagnosed with colon cancer.
  • 6. A classifier for predicting clinical outcome in a patient diagnosed with cancer comprising a computing means and a user interface, wherein said computing means comprises a storing means and a means for outputting processed data, wherein said storing means comprises a set of genes classified by outcome, wherein said interface is coupled to said computing means.
  • 7. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508; A1149393; AA883496; AA167823; A1203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638; R43684; N21630; T81317; R45595; T90789; and AA283062.
  • 8. The classifier of claim 6, wherein said set of genes consists of the following genes: AA045075; AA425320; AA437223; AA479270; AA486233; AA487274; AA488652; AA694500; AA704270; AA706226; AA709158; AA775616; AA777892; AA873159; AA969508; A1203139; A1299969; H17364; H17627; H19822; H23551; H62801; H85015; N21630; N36176; N72847; N92519; R27767; R34578; R38360; R43597; R43684; W73732; AA450205; A1081269; R59314; AA702174; A1002566; AA676797; AA453508; W93980; AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404; AA826237; AA007421; AA478952; W93980; AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404; AA826237; AA007421; AA478952; AA885096; H29032; R10545; AA448641; R38266; H17543; T81317; AA453790; R22340; AA987675; N51543; N74527; AA121778; AA258031; AA702422; T64924; R42984; R59360; R63816; T49061; AA016210; AA682585; AA705040; AA909959; A1240881; AA133215; AA699408; AA910771; A1362799; H51549; R06568; AA001604; AA132065; AA490493; AA633845; A1261561; H81024; N75004; W96216; AA045793; AA284172; AA411324; AA448261; AA479952; AA485752; AA504266; AA630376; AA634261; AA701167; AA703019; AA706041; AA773139; AA776813; AA862465; AA977711; A1288845; H15267; H18956; H73608; H99544; N45282; N48270; N59451; N95226; R37028; R66605; T51004; T51316; T72535; and W72103.
  • 9. The classifier of claim 6, wherein said set of genes consists of the following genes: AA007421; AA045075; AA045308; AA418726; AA425320; AA450205; AA453508; AA453790; AA477404; AA478952; AA479270; AA486233; AA487274; AA664240; AA676797; AA702174; AA706226; AA709158; AA775616; AA826237; AA873159; AA969508; AI002566; AI29969; H17364; H19822; H23551; N36176; N72847; R10545; R27767; R34578; R59314; W73732; AA448641; R59360; AA121778; H51549; H81024; AA490493; R42984; AA258031; AA133215; R63816; N95226; N74527; AA702422; A1261561; AA132065; A1362799; AA045793; AA284172; N51632; AA482110; AA485450; AA699408; N70777; AA993736; A1139498; N59721; AA431885; AA911661; AA775865; R30941; AA703019; AA777192; W72103; H15267; H17638; R60193; R92717; AA706041; AA411324; AA504266; AA932696; AA973494; N45100; AA418410; AA725641; AA954482; H45391; T86932; AA279188; AA485752; AA680132; AA977711; W93370; AA036727; AA071075; AA464612; AA481250; AA598659; AA682905; R17811; W93592; AA017301; AA046406; AA256304; AA416759; AA448261; AA452130; AA457528; AA460542; AA479952; AA481507; AA504342; AA598970; AA630376; AA634261; AA677254; AA757564; AA775888; AA844864; AA862465; AA989139; AI253017; A1394426; H99544; N41021; N45282; N46845; N48270; N59846; R16760; R44546; R92994; T51004; T56281; T70321; and W45025.
  • 10. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA883496.
  • 11. A method for predicting a clinical outcome for a patient diagnosed with cancer, said method comprising the steps of: a) classifying at least one gene that correlates with a clinical outcome; b) establishing a set of reference gene expression levels based on the at least one gene; c) receiving biological information from the patient; d) extrapolating from the biological information the level of intracellular expression of said at least one gene; e) comparing said level of intracellular expression against said set of reference gene expression levels; and f) predicting a clinical outcome based on the deviation of the intracellular level expression from that of the reference gene expression levels.
  • 12. The method of claim 1, wherein identification of said at least one gene is performed with any on or combination of the following: significance analysis of microarrays, cluster analysis, support vector technology, neural network, and leave-one-out cross validation.
  • 13. The method of claim 1, further comprising the step of estimating the accuracy of the predicted clinical outcome.
  • 14. The method of claim 1, wherein the biological information is a clinical specimen of bodily fluid or tissue.
  • 15. The method of claim 14, wherein the biological information is a clinical tumor sample.
  • 16. The method of claim 1, wherein the outcome being evaluated is for a patient diagnosed with colon cancer.
  • 17. The method of claim 1, wherein the predicted clinical outcome is the probability of patient survival at a predetermined date.
  • 18. The method of claim 1, further comprising the step of generating a treatment regimen based on the predicted clinical outcome.
  • 19. The method of claim 1, wherein the gene that is identified is one with the accession number selected from the group consisting of: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508; A1149393; AA883496; AA167823; AI203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638 R43684; N21630; T81317; R45595; T90789; and AA283062.
CROSS-REFERENCE TO A RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application Ser. No. 60/547,871, filed Feb. 25, 2004, which is hereby incorporated by reference in its entirety.