Neoepitope detection of disease using protein arrays

SUBMISSION OF SEQUENCE LISTING

The Sequence Listing associated with this application is filed in electronic format via EFS-Web and hereby incorporated by reference into the specification in its entirety. The name of the text file containing the Sequence Listing is 0788.19 06 18 10_ST25. The size of the text file is 55 KB, and the text file was created on Jun. 18, 2010.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates to an assay and method for diagnosing disease. More specifically, the present invention relates to an immunoassay for use in diagnosing cancer.

2. Background Art

It is commonly known in the art that genetic mutations can be used for detecting cancer. For example, the tumorigenic process leading to colorectal carcinoma formation involves multiple genetic alterations (Fearon et al (1990) Cell 61, 759-767). Tumor suppressor genes such as p53, DCC and APC are frequently inactivated in colorectal carcinomas, typically by a combination of genetic deletion of one allele and point mutation of the second allele (Baker et al (1989) Science 244, 217-221; Fearon et al (1990) Science 247, 49-56; Nishisho et al (1991) Science 253, 665-669; and Groden et al (1991) Cell 66, 589-600). Mutation of two mismatch repair genes that regulate genetic stability was associated with a form of familial colon cancer (Fishel et al (1993) Cell 75, 1027-1038; Leach et al (1993) Cell 75, 1215-1225; Papadopoulos et al (1994) Science 263, 1625-1629; and Bronner et al (1994) Nature 368, 258-261). Proto-oncogenes such as myc and ras are altered in colorectal carcinomas, with c-myc RNA being overexpressed in as many as 65% of carcinomas (Erisman et al (1985) Mol. Cell. Biol. 5, 1969-1976), and ras activation by point mutation occurring in as many as 50% of carcinomas (Bos et al (1987) Nature 327, 293-297; and Forrester et al (1987) Nature 327, 298-303). Other proto-oncogenes, such as myb and neu are activated with a much lower frequency (Alitalo et al (1984) Proc. Natl. Acad. Sci. USA 81, 4534-4538; and D'Emilia et al (1989) Oncogene 4, 1233-1239). No common series of genetic alterations is found in all colorectal tumors, suggesting that a variety of such combinations can be able to generate these tumors.

Increased tyrosine phosphorylation is a common element in signaling pathways that control cell proliferation. The deregulation of protein tyrosine kinases (PTKS) through overexpression or mutation has been recognized as an important step in cell transformation and tumorigenesis, and many oncogenes encode PTKs (Hunter (1989) in oncogenes and the Molecular Origins of Cancer, ed. Weinberg (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.), pp. 147-173). Numerous studies have addressed the involvement of PTKs in human tumorigenesis. Activated PTKs associated with colorectal carcinoma include c-neu (amplification), trk (rearrangement), and c-src and c-yes (mechanism unknown) (D'Emilia et al (1989), ibid; Martin-Zanca et al (1986) Nature 3, 743-748; Bolen et al (1987) Proc. Natl. Acad. Sci. USA 84, 2251-2255; Cartwright et al (1989) J. Clin. Invest. 83, 2025-2033; Cartwright et al (1990) Proc. Natl. Acad. Sci. USA 87, 558-562; Talamonti et al (1993) J. Clin. Invest. 91, 53-60; and Park et al (1993) Oncogene 8, 2627-2635).

Mutations, such as those disclosed above can be useful in detecting cancer. However, there have been few advancements which can repeatably be used in diagnosing cancer prior to the existence of a tumor. For example, breast cancer, which is by far the most common form of cancer in women, is the second leading cause of cancer death in humans. Despite many recent advances in diagnosing and treating breast cancer, the prevalence of this disease has been steadily rising at a rate of about 1% per year since 1940. Today, the likelihood that a women living in North America can develop breast cancer during her lifetime is one in eight.

The current widespread use of mammography has resulted in improved detection of breast cancer. Nonetheless, the death rate due to breast cancer has remained unchanged at about 27 deaths per 100,000 women. All too often, breast cancer is discovered at a stage that is too far advanced, when therapeutic options and survival rates are severely limited. Accordingly, more sensitive and reliable methods are needed to detect small (less than 2 cm diameter), early stage, in situ carcinomas of the breast. Such methods should significantly improve breast cancer survival, as suggested by the successful employment of Papinicolou smears for early detection and treatment of cervical cancer.

In addition to the problem of early detection, there remain serious problems in distinguishing between malignant and benign breast disease, in staging known breast cancers, and in differentiating between different types of breast cancers (eg. estrogen dependent versus non-estrogen dependent tumors). Recent efforts to develop improved methods for breast cancer detection, staging and classification have focused on a promising array of so-called cancer “markers.” Cancer markers are typically proteins that are uniquely expressed (e.g. as a cell surface or secreted protein) by cancerous cells, or are expressed at measurably increased or decreased levels by cancerous cells compared to normal cells. Other cancer markers can include specific DNA or RNA sequences marking deleterious genetic changes or alterations in the patterns or levels of gene expression associated with particular forms of cancer.

The utility of specific breast cancer markers for screening and diagnosis, staging and classification, monitoring and/or therapy purposes depends on the nature and activity of the marker in question. For general reviews of breast cancer markers, see Porter-Jordan et al., Hematol. Oncol. Clin. North Amer. 8: 73-100, 1994; and Greiner, Pharmaceutical Tech., May, 1993, pp. 28-44. As reflected in these reviews, a primary focus for developing breast cancer markers has centered on the overlapping areas of tumorigenesis, tumor growth and cancer invasion. Tumorigenesis and tumor growth can be assessed using a variety of cell proliferation markers (for example Ki67, cyclin D1 and proliferating cell nuclear antigen (PCNA)), some of which can be important oncogenes as well. Tumor growth can also be evaluated using a variety of growth factor and hormone markers (for example estrogen, epidermal growth factor (EGF), erbB-2, transforming growth factor (TGF)a), which can be overexpressed, underexpressed or exhibit altered activity in cancer cells. By the same token, receptors of autocrine or exocrine growth factors and hormones (for example insulin growth factor (IGF) receptors, and EGF receptor) can also exhibit changes in expression or activity associated with tumor growth. Lastly, tumor growth is supported by angiogenesis involving the elaboration and growth of new blood vessels and the concomitant expression of angiogenic factors that can serve as markers for tumorigenesis and tumor growth.

In addition to tumorigenic, proliferation and growth markers, a number of markers have been identified that can serve as indicators of invasiveness and/or metastatic potential in a population of cancer cells. These markers generally reflect altered interactions between cancer cells and their surrounding microenvironment. For example, when cancer cells invade or metastasize, detectable changes can occur in the expression or activity of cell adhesion or motility factors, examples of which include the cancer markers Cathepsin D, plasminogen activators, collagenases and other factors. In addition, decreased expression or overexpression of several putative tumor “suppressor” genes (for example nm23, p53 and rb) has been directly associated with increased metastatic potential or deregulation of growth predictive of poor disease outcome.

Additionally, ovarian cancer has the highest mortality rate of all gynecological cancers and yet there is still no reliable and easy to administer screening test. Using the multimodality approach to treatment, including aggressive cytoreductive surgery in combination with chemotherapy, five-year survival rates diminish with increasing stage: Stage I (93%), Stage II (70%), Stage III (37%), and Stage IV (25%). Despite advances in molecular biology, surgical oncology, and chemotherapy, the overall prognosis for ovarian cancer patients diagnosed at Stages II-IV remains poor. The excellent survival rates for Stage I disease provide the rationale for efforts to detect early-stage ovarian cancer as a screening test. The first priority of any screening procedure for ovarian cancer is high specificity in order to minimize the number of false positive results and thereby ensuring an acceptable positive predictive value (PPV). There have been no effective and reliable tests developed to date.

Screening for ovarian cancer has been based on strategies using serum tumor markers or ultrasound imaging of the ovaries. The most extensively investigated biomarker is CA-125, whose serum levels are elevated in 50% of Stage I and 90% of Stage II ovarian cancer patients. However, elevated CA-125 levels have also been observed in healthy women during menstruation, in patients with other gynecological diseases, and other malignancies, which suggests that the false-positive rate of CA-125 can be high.

In contrast to detection of serum antigens, the detection of serum antibody responses to tumor antigens may provide a more reliable serum marker for cancer diagnosis because serum antibodies are more stable than serum antigens. Furthermore, antibodies may be more abundant than antigens, especially at low tumor burdens characteristic of early stages. Thirty percent of patients with ductal carcinoma in situ (DCIS) in which the protooncogene HER2/neu was overexpressed had serum antibodies specific to this protein. In addition, antibodies to p53 have been reported in patients with early-stage ovarian, and colorectal cancers. Antibodies against heat shock protein 90 (HSP90) were also found to be associated with patients' survival and tumor metastasis. Antibodies against ribosomal proteins may constitute a novel serological marker. The presence of antibodies to ubiquitin C-terminal hydrolase L3 in colon cancer has also been reported. Changes in the level of gene expression in cancer and aberrant expression of tissue-restricted gene products in cancer are factors in the development of a humoral immune response in cancer patients. In this respect, serological analysis of recombinant cDNA expression libraries (SEREX) of human tumors with autologous serum has identified some relevant tumor antigens. Among the gene products shown to be immunogenic are MACE, SSX2, and NY-ESO-1, which are expressed in various tumor types, but not in normal tissues except testis.

Studies on new technology based on proteomic patterns in serum to screen for early stage ovarian cancer have been reported by Petricoin et al. (2002). The procedure involved generating proteomic spectra of serum proteins using Matrix-assisted laser desorption and ionization time-of-flight (MALDI-TOF) and surface-enhanced laser desorption and ionization time-of-flight (SELDI-TOF) mass spectroscopy. In independent validation to detect early stage invasive epithelial ovarian cancer from healthy controls, the sensitivity of a multivariate model combining the three biomarkers and CA125 [74% (95% CI, 52-90%)] was higher than that of CA125 alone [65% (95% CI, 43-84%)] at a matched specificity of 97% (95% CI, 89-100%). When compared at a fixed sensitivity of 83% (95% CI, 61-95%), the specificity of the model [94% (95% CI, 85-98%)] was significantly better than that of CA125 alone [52% (95% CI, 39-65%)]. Due to the low prevalence of ovarian cancer in the general population, this level of specificity is unacceptable for a realistic ovarian cancer diagnostic test. Assuming that in a clinical setting with low-risk patients, ovarian cancer is present in approximately one per 2500 patients, the (MALDI/SELDI) approach would produce 125 false positives for every true cancer patient. Furthermore, some issues have arisen regarding the mass spectroscopy technology of protein profiling. It has been reported that the data obtained by this technology are difficult to reproduce and that they may be biased by artifacts in sample preparation, storage and processing, and patient selection.

In summary, the evaluation of proliferation markers, oncogenes, growth factors and growth factor receptors, angiogenic factors, proteases, adhesion factors and tumor suppressor genes, among other cancer markers, can provide important information concerning the risk, presence, status or future behavior of cancer in a patient. Determining the presence or level of expression or activity of one or more of these cancer markers can aid in the differential diagnosis of patients with uncertain clinical abnormalities, for example by distinguishing malignant from benign abnormalities. Furthermore, in patients presenting with established malignancy, cancer markers can be useful to predict the risk of future relapse, or the likelihood of response in a particular patient to a selected therapeutic course. Even more specific information can be obtained by analyzing highly specific cancer markers, or combinations of markers, which can predict responsiveness of a patient to specific drugs or treatment options.

Methods for detecting and measuring cancer markers have been recently revolutionized by the development of immunological assays, particularly by assays that utilize monoclonal antibody technology. Previously, many cancer markers could only be detected or measured using conventional biochemical assay methods, which generally require large test samples and are therefore unsuitable in most clinical applications. In contrast, modern immunoassay techniques can detect and measure cancer markers in relatively much smaller samples, particularly when monoclonal antibodies that specifically recognize a targeted marker protein are used. Accordingly, it is now routine to assay for the presence or absence, level, or activity of selected cancer markers by immunohistochemically staining tissue specimens obtained via conventional biopsy methods. Because of the highly sensitive nature of immunohistochemical staining, these methods have also been successfully employed to detect and measure cancer markers in smaller, needle biopsy specimens which require less invasive sample gathering procedures compared to conventional biopsy specimens. In addition, other immunological methods have been developed and are now well known in the art that allow for detection and measurement of cancer markers in non-cellular samples such as serum and other biological fluids from patients. The use of these alternative sample sources substantially reduces the morbidity and costs of assays compared to procedures employing conventional biopsy samples, which allows for application of cancer marker assays in early screening and low risk monitoring programs where invasive biopsy procedures are not indicated.

For the purpose of cancer evaluation, the use of conventional or needle biopsy samples for cancer marker assays is often undesirable, because a primary goal of such assays is to detect the cancer before it progresses to a palpable or detectable tumor stage. Prior to this stage, biopsies are generally contraindicated, making early screening and low risk monitoring procedures employing such samples untenable. Therefore, there is general need in the art to obtain samples for cancer marker assays by less invasive means than biopsy, for example by serum withdrawal.

Efforts to utilize serum samples for cancer marker assays have met with limited success, largely because the targeted markers are either not detectable in serum, or because telltale changes in the levels or activity of the markers cannot be monitored in serum. In addition, the presence of cancer markers in serum probably occurs at the time of micro-metastasis, making serum assays less useful for detecting pre-metastatic disease.

Previous attempts to develop non-invasive breast cancer marker assays utilizing mammary fluid samples have included studies of mammary fluid obtained from patients presenting with spontaneous nipple discharge. In one of these studies, conducted by Inaji et al., Cancer 60: 3008-3013, 1987, levels of the breast cancer marker carcinoembryonic antigen (CEA) were measured using conventional, enzyme linked immunoassay (ELISA) and sandwich-type, monoclonal immunoassay methods. These methods successfully and reproducibly demonstrated that CEA levels in spontaneously discharged mammary fluid provide a sensitive indicator of nonpalpable breast cancer. In a subsequent study, also by Inaji et al., Jpn. J. Clin. Oncol. 19: 373-379, 1989, these results were expanded using a more sensitive, dry chemistry, dot-immunobinding assay for CEA determination. This latter study reported that elevated CEA levels occurred in 43% of patients tested with palpable breast tumors, and in 73% of patients tested with nonpalpable breast tumors. CEA levels in the discharged mammary fluid were highly correlated with intratumoral CEA levels, indicating that the level of CEA expression by breast cancer cells is closely reflected in the mammary fluid CEA content. Based on these results, the authors concluded that immunoassays for CEA in spontaneously discharged mammary fluid are useful for screening nonpalpable breast cancer.

Although the evaluation of mammary fluid has been shown to be a useful method for screening nonpalpable breast cancer in women who experience spontaneous nipple discharge, the rarity of this condition renders the methods of Inaji et al, inapplicable to the majority of women who are candidates for early breast cancer screening. In addition, the first Inaji report cited above determined that certain patients suffering spontaneous nipple discharge secrete less than 10 μl of mammary fluid, which is a critically low level for the ELISA and sandwich immunoassays employed in that study. It is likely that other antibodies used to assay other cancer markers can exhibit even lower sensitivity than the anti-CEA antibodies used by Inaji and coworkers, and can therefore not be adaptable or sensitive enough to be employed even in dry chemical immunoassays of small samples of spontaneously discharged mammary fluid.

In view of the above, an important need exists in the art for more widely applicable, non-invasive methods and materials to obtain biological samples for use in evaluating, diagnosing and managing breast and other diseases including cancer, particularly for screening early stage, nonpalpable tumors. A related need exists for methods and materials that utilize such readily obtained biological samples to evaluate, diagnose and manage disease, particularly by detecting or measuring selected cancer markers, or panels of cancer markers, to provide highly specific, cancer prognostic and/or treatment-related information, and to diagnose and manage pre-cancerous conditions, cancer susceptibility, bacterial and other infections, and other diseases.

With specific regard to such assays, specific antibodies can only be measured by detecting binding to their antigen or a mimic thereof. Although certain classes of immunoglobulins containing the antibodies of interest can, in some cases, be separated from the sample prior to the assay (Decker, et al., EP 0,168,689 A2), in all assays, at least some portion of the sample immunoglobulins are contacted with antigen. For example, in assays for specific IgM, a portion of the total IgM can be adsorbed to a surface and the sample removed prior to detection of the specific IgM by contacting with antigen. Binding is then measured by detection of the bound antibody, detection of the bound antigen or detection of the free antigen.

For detection of bound antibody, a labeled anti-human immunoglobulin or labeled antigen is normally allowed to bind antibodies that have been specifically adsorbed from the sample onto a surface coated with the antigen, Bolz, et al., U.S. Pat. No. 4,020,151. Excess reagent is washed away and the label that remains bound to the surface is detected. This is the procedure in the most frequently used assays, or example, for hepatitis and human immunodeficiency virus and for numerous immunohistochemical tests, Nakamura, et al., Arch Pathol Lab Med 112:869-877 (1988). Although this method is relatively sensitive, it is subject to interference from non-specific binding to the surface by non-specific immunoglobulins that can not be differentiated from the specific immunoglobulins.

Another method of detecting bound antibodies involves combining the sample and a competing labeled antibody, with a support-bound antigen, Schuurs, et al., U.S. Pat. No. 3,654,090. This method has its limitations because antibodies in sera bind numerous epitopes, making competition inefficient.

For detection of bound antigen, the antigen can be used in excess of the maximum amount of antibody that is present in the sample or in an amount that is less than the amount of antibody. For example, radioimmunoprecipitation (“RIP”) assays for GAD autoantibodies have been developed and are currently in use, Atkinson, et al., Lancet 335:1357-1360 (1990). However, attempts to convert this assay to an enzyme linked immunosorbent assay (“ELISA”) format have not been successful. The RIP assay is based on precipitation of immunoglobulins in human sera, and led to the development of a radioimmunoassay (“RIA”) for GAD autoantibodies. In both the RIP and the RIA, the antigen is added in excess and the bound antigen:antibody complex is precipitated with protein A-Sepharose. The complex is then washed or further separated by electrophoresis and the antigen in the complex is detected.

Other precipitating agents can be used such as rheumatoid factor or C1q, Masson, et al., U.S. Pat. No. 4,062,935; polyethylene glycol, Soeldner, et al., U.S. Pat. No. 4,855,242; and protein A, Ito, et al., EP 0,410,893 A2. The precipitated antigen can be measured to indicate the amount of antibody in the sample; the amount of antigen remaining in solution can be measured; or both the precipitated antigen and the soluble antigen can be measured to correct for any labeled antigen that is non-specifically precipitated. These methods, while quite sensitive, are all difficult to carry out because of the need for rigorous separation of the free antigen from the bound complex, which requires at a minimum filtration or centrifugation and multiple washing of the precipitate.

Alternatively, detection of the bound antigen can be employed when the amount of antigen is less than the maximum amount of antibody. Normally, that is carried out using particles such as latex particles or erythrocytes that are coated with the antigen, Cambiaso, et al., U.S. Pat. No. 4,184,849 and Uchida, et al., EP 0,070,527 A1. Antibodies can specifically agglutinate these particles and can then be detected by light scattering or other methods. It is necessary in these assays to use a precise amount of antigen as too little antigen provides an assay response that is biphasic and high antibody titers can be read as negative, while too much antigen adversely affects the sensitivity. It is therefore necessary to carry out sequential dilutions of the sample to assure that positive samples are not missed. Further, these assays tend to detect only antibodies with relatively high affinities and the sensitivity of the method is compromised by the tendency for all of the binding sites of each antibody to bind to the antigen on the particle to which it first binds, leaving no sites for binding to the other particle.

For assays in which the free antigen is detected, the antigen can also be added in excess or in a limited amount although only the former has been reported. Assays of this type have been described where an excess of antigen is added to the sample, the immunoglobulins are precipitated, and the antigen remaining in the solution is measured, Masson, et al., supra and Soeldner, et al., supra. These assays are relatively insensitive because only a small percentage change in the amount of free antigen occurs with low amounts of antibody, and this small percentage is difficult to measure accurately.

Practical assays in which the free antigen is detected and the antigen is not present in excess of the maximum amount of antibody expected in a sample have not been described. However, in van Erp, et al., Journal of Immunoassay 12(3):425-443 (1991), a fixed concentration of monoclonal antibody was incubated with a concentration dilution series of antigen, and free antigen was then measured using a gold sol particle agglutination immunoassay to determine antibody affinity constants.

There has been much research in the area of evaluating useful markers for determining the risk factor for patients developing IDDM. These include insulin autoantibodies, Soeldner, et al., supra and circulating autoantibodies to glutamic acid decarboxylase (“GAD”), Atkinson, et al., PCT/US89/05570 and Tobin, et al., PCT/US91/06872. In addition, Rabin, et al., U.S. Pat. No. 5,200,318 describes numerous assay formats for the detection of GAD and pancreatic islet cell antigen autoantibodies. GAD autoantibodies are of particular diagnostic importance because they occur in preclinical stages of the disease, which can make therapeutic intervention possible. However, the use of GAD autoantibodies as a diagnostic marker has been impeded by the lack of a convenient, nonisotopic assay.

One assay method involves incubating a support-bound antigen with the sample, then adding a labeled anti-human immunoglobulin. This is the basis for numerous commercially available assay kits for antibodies such as the Syn ELISA kit which assays for autoantibodies to GAD65, and is described in product literature entitled “syn^ELISAGAD II-Antibodies” (Elias USA, Inc.). Substantial dilution of the sample is required because the method is subject to high background signals from adsorption of non-specific human immunoglobulins to the support.

Many of the assays described above involve detection of antibody that becomes bound to an immobilized antigen. This can have an adverse affect on the sensitivity of the assay due to difficulty in distinguishing between specific immunoglobulins and other immunoglobulins in the sample, which bind non-specifically to the immobilized antigen. There is not only a need to develop an assay that avoids non-specific detection of immunoglobulins, but there is also the need for an improved method of detecting antibodies that combines the sensitivity advantage of immunoprecipitation assays with a simplified protocol. Finally, assays that can help evaluate the risk of developing diseases are medically and economically very important. The present invention addresses these needs.

SUMMARY OF THE INVENTION

According to the present invention, there is provided a biosensor for use in detecting the presence of diseases, the biosensor comprising a detector for detecting a presence of at least one marker indicative of a specific disease. Also provided is a method of determining efficacy of a pharmaceutical for treating a disease or staging disease by administering a pharmaceutical to a sample containing markers for a disease, detecting the amount of at least one marker of the disease in the sample, and analyzing the amount of the marker in the sample, whereby the amount of marker correlates to pharmaceutical efficacy or disease stage. Markers for gynecological disease selected from the list in Table 6 and further from the list in Table 8 are provided. An immuno-imaging agent comprising labeled antibodies, whereby the labeled antibodies are isolated and reactive to proteins overexpressed in vivo are provided. Informatics software for analyzing the arrays discussed above is provided, wherein the software includes analyzing means for analyzing the arrays.

DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

Other advantages of the present invention are readily appreciated as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings wherein:

FIGS. 1A-D are photographs showing the identification of a phage displaying peptide sequence of Sirt2 by plaque lift;

FIG. 2 is a photograph showing the analysis of the PCR product of the plaques by Southern Blot hybridization;

FIG. 3 is a photograph showing the Dot Blot analysis of Sirt2 positive plaques;

FIG. 4 is a photograph showing green and red labeled detection of serum antibodies indicative of the antibody reaction to the protein;

FIGS. 5 A-E are photographs showing the ECL detection of phagotopes selected with a breast cancer patient's serum;

FIGS. 6 A-C are as follows: FIG. 6A is a photograph showing the comparison of serum reaction of control and breast cancer patient with phagotopes from BP4; and FIG. 6B is a graph of the BP4 filters which were scanned thereby showing the ratio of the pixel densities plotted in rank order; FIG. 6C is a scan of a microarray demonstrating the binding a Cy5-labeled antihuman IgG to human IgG from patient #1's serum and the control Cy3-labeled antibody to phage T7 capsid protein to phage clones microarrayed on glass;

FIGS. 7A-7B show the method of finding informative epitopes: FIG. 7A shows the cancer template; FIG. 7B, shows the spot intensities plotted on the vertical axis for 12 subjects (controls to the left and patients to the right) the template defined on the left (shown in blue) was used with a correlation distance, a correlation threshold of 0.8 selected the 46 epitopes shown here in red (out of the total of 4×96=384 shown here in yellow);

FIGS. 8A-8B show an example comparison between the histogram of a control subject (19218) with a high but non-specific reaction (FIG. 8A), and the histogram of a patient (19223) (FIG. 8B); the histograms are calculated on the ratios of the background corrected mean intensity of the human IgG labeled with Cy5 vs. the background corrected mean intensity of the T7 labeled with Cy3;

FIGS. 9A-9B show a comparison between the scatterplot of a control subject (19218) (FIG. 9A) with a strong but non-specific reaction and the scatterplot of a patient MEC1 (19223) (FIG. 9B), the scattergrams plot the background corrected mean intensity of the human IgG labeled with Cy5 vs. the background corrected mean intensity of the T7 labeled with Cy3;

FIG. 10 shows the matrix of reactivity between sets of clones coming from patients 1-12 (in rows) and sera from same patients (in columns), at this point (step 2 of Procedure 2), the matrix contains the results of the self-reactions: patients 1-10 have a specific self-reaction whereas patients 11 and 12 do not, Patients 11 and 12 are eliminated from the clone selection procedure;

FIG. 11 shows a matrix of reactivity between sources of clones and different sera ordered by reactivity; the clones from patient 2 react with sera from self (column 2) and patients 4 and 8; the clones from patient 3 react with sera from self (column 3) and patients 6 and 10, etc, note that the union of the set of clones coming from patients 2, 3, 5, 7 and 1 ensures that the chip made with these clones reacts with all patients;

FIGS. 12A-G are filter microarrays showing antigen binding with IgGs in the serum of Stage I ovarian cancer patients; and

FIGS. 13A-D are graphs showing the determination of a titerable antigen-antibody binding in ELISA macroarray analysis.

DETAILED DESCRIPTION OF THE INVENTION

Generally, the present invention provides a method and markers for use in detecting disease and stages of disease. In other words, the markers can be used to determine the presence of disease without requiring the presence of symptoms.

The method and markers of the present invention can be used to diagnose the presence of a disease or a disease stage in a patient. The method of the present invention utilizes a detector device for detecting the presence of at least one marker in the serum of the patient. The benefit of such an analytical device is that the marker that is detected is one of a panel of markers. The panel of markers can include markers that are known to those of skill in the art and markers determined utilizing the methodology disclosed herein. The markers of the present invention can be used to detect diseases. Examples of diseases include, but are not limited to, gynecological sickness, such as endometriosis, ovarian cancer, breast cancer, cervical cancer, and primary peritoneal carcinoma. The method can also be used to identify overexpressed or mutated proteins in tumor cells. That such proteins are mutated or overexpressed presumably is the basis for the immune reaction to these proteins. Therefore markers identified using these methods could provide markers for molecular pathology as diagnostic or prognostic markers.

The method can also be used for immunotherapy targeted to a person's immunoprofile based on the arrays. For personalized immunotherapy, the reactivity to particular epitope clones can be correlated using sera from patients having cancer. Using a comprehensive panel of epitope markers that can accurately detect early stage ovarian cancer one can utilize these antigen as immuno-therapeutic agents personalized to the immuno-profile of each patient. When T-cells from the patient recognize antigen biomarkers, they get stimulated, activated and therefore produce an immune-response. Such reactivity demonstrates the potential of each antigen as a component of a vaccine to induce a T cell-mediated immune response essential for generation of cancer vaccines. Individuals scoring positive in the presymptomatic testing for OVCA can then be offered an anti-tumor vaccine tailored to their immunoprofile against a panel of tumor antigens.

The detector includes, but is not limited to an assay, a slide, a filter, a microarray, macroarray, computer software implementing the data analysis methods, and any combinations thereof. The detector can also include a two-color detection system or other detector system known to those of skill in the art.

By “bodily fluid” as used herein it is meant any bodily fluid known to those of skill in the art to contain antibodies therein. Examples include, but are not limited to, blood, saliva, tears, spinal fluid, serum, and other fluids known to those of skill in the art to contain antibodies.

By “biopanning”, it is meant a selection process for use in screening a library (Parmley and Smith, Gene, 73:308 (1988); Noren, C. J., NEB Transcript, 8(1); 1 (1996)). Biopanning is carried out by incubating phages encoding the peptides with a plate coated with the proteins, washing away the unbound phage, eluting, and amplifying the specifically bound phage. Those skilled in the art readily recognize other immobilization schemes that can provide equivalent technology, such as but not limited to binding the proteins or other targets to beads.

By staging the disease, as for example in cancer, it is intended to include determining the extent of a cancer, especially whether the disease has spread from the original site to other parts of the body. The stages can range from 0 to 5 with 0 being the presence of cancerous cells and 5 being the spread of the cancer cells to other parts of the body including the lymph nodes. Further, the staging can indicate the stage of a borderline histology. A borderline histology is a less malignant form of disease. Additionally, staging can indicate a relapse of disease, in other words the reoccurrence of disease.

The term “marker” as used herein is intended to include, but is not limited to, a gene or a piece of a gene which codes for a protein, a protein such as a fusion protein, open reading frames such as ESTs, epitopes, mimotopes, antigens, and any other indicator of immune response. The marker can also be used as a predictor of disease or the recurrence of disease.

The present invention further includes a random peptide epitope (mimotope) that mimics a natural antigenic epitope during epitope presentation. Such mimotopes are useful in the applications and methods discussed above. Also included in the present invention is a method of identifying a random peptide epitope. In the method, a library of random peptide epitopes is generated or selected. The library is contacted with an anti-antibody. Mimotopes are identified that are specifically immunoreactive with the antibody. Sera (containing anti antibodies) or antibodies generated by the methods of the present invention can be used. Random peptide libraries can, for example, be displayed on phage (phagotopes) or generated as combinatorial libraries.

“Antibody” refers to a polypeptide comprising a framework region from an immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon, and mu constant region genes, as well as the various immunoglobulin diversity/joining/variable region genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively.

An exemplary immunoglobulin (antibody) structural unit comprises a tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one “light” (about 25 kDa) and one “heavy” chain (about 50-70 kDa). The N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The terms variable light chain (V_L) and variable heavy chain (V_H) refer to these light and heavy chains respectively.

Antibodies exist, e.g., as intact immunoglobulins or as a number of well-characterized fragments produced by digestion with various peptidases. Thus, for example, pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)′₂, a dimer of Fab which itself is a light chain joined to V_H-C_H1 by a disulfide bond. The F(ab)′₂can be reduced under mild conditions to break the disulfide linkage in the hinge region, thereby converting the F(ab)′₂dimer into an Fab′ monomer. The Fab′ monomer is essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 1993). While various antibody fragments are defined in terms of the digestion of an intact antibody, one of skill can appreciate that such fragments can be synthesized de novo either chemically or by using recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments either produced by the modification of whole antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al., Nature 348:552-554 (1990)).

For preparation of monoclonal or polyclonal antibodies, any technique known in the art can be used (see, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et al., Immunology Today 4: 72 (1983); Cole et al., pp. 77-96 in Monoclonal Antibodies and Cancer Therapy (1985)). Techniques for the production of single chain antibodies (U.S. Pat. No. 4,946,778) can be adapted to produce antibodies to polypeptides of this invention. Also, transgenic mice, or other organisms such as other mammals, can be used to express humanized antibodies. Alternatively, phage display technology can be used to identify antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al., Nature 348:552-554 (1990); Marks et al., Biotechnology 10:779-783 (1992)).

A “chimeric antibody” is an antibody molecule in which (a) the constant region, or a portion thereof, is altered, replaced or exchanged so that the antigen binding site (variable region) is linked to a constant region of a different or altered class, effector function and/or species, or an entirely different molecule which confers new properties to the chimeric antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the variable region, or a portion thereof, is altered, replaced or exchanged with a variable region having a different or altered antigen specificity.

The term “immunoassay” is an assay wherein an antibody specifically binds to an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen. In addition, an antigen can be used to capture or specifically bind an antibody.

The phrase “specifically (or selectively) binds” to an antibody or “specifically (or selectively) immunoreactive with,” when referring to a protein or peptide, refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times the background and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions can require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to modified β-tubulin from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive, e.g., with β-tubulin modified at cysteine 239 and not with other proteins. This selection can be achieved by subtracting out antibodies that cross-react with other molecules. Monoclonal antibodies raised against modified β-tubulin can also be used. A variety of immunoassay formats can be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction can be at least twice background signal or noise and more typically more than 10 to 100 times background.

A “label” or a “detectable moiety” is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include ³²P, fluorescent dyes, iodine, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins for which antisera or monoclonal antibodies are available, e.g., by incorporating a radiolabel into the peptide, or any other label known to those of skill in the art.

A “labeled antibody or probe” is one that is bound, either covalently, through a linker or a chemical bond, or noncovalently, through ionic, van der Waals, electrostatic, or hydrogen bonds to a label such that the presence of the antibody or probe can be detected by detecting the presence of the label bound to the antibody or probe.

The terms “isolated” “purified” or “biologically pure” refer to material that is substantially or essentially free from components that normally accompany it as found in its native state. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein that is the predominant species present in a preparation is substantially purified. The term “purified” denotes that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. Particularly, it means that the nucleic acid or protein is at least 85% pure, optionally at least 95% pure, and optionally at least 99% pure.

The term “recombinant” when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under expressed or not expressed at all.

An “expression vector” is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a host cell. The expression vector can be part of a plasmid, virus, or nucleic acid fragment. Typically, the expression vector includes a nucleic acid to be transcribed operably linked to a promoter.

By “support or surface” as used herein, the term is intended to include, but is not limited to a solid phase which is typically a support or surface, which is a porous or non-porous water insoluble material that can have any one of a number of shapes, such as strip, rod, particle, including beads and the like. Suitable materials are well known in the art and are described in, for example, Ullman, et al. U.S. Pat. No. 5,185,243, columns 10-11, Kurn, et al., U.S. Pat. No. 4,868,104, column 6, lines 21-42 and Milburn, et al., U.S. Pat. No. 4,959,303, column 6, lines 14-31, which are incorporated herein by reference. Binding of ligands and receptors to the support or surface can be accomplished by well-known techniques, readily available in the literature. See, for example, “Immobilized Enzymes,” Ichiro Chibata, Halsted Press, New York (1978) and Cuatrecasas, J. Biol. Chem. 245:3059 (1970). Whatever type of solid support is used, it must be treated so as to have bound to its surface either a receptor or ligand that directly or indirectly binds the antigen. Typical receptors include antibodies, intrinsic factor, specifically reactive chemical agents such as sulfhydryl groups that can react with a group on the antigen, and the like. For example, avidin or streptavidin can be covalently bound to spherical glass beads of 0.5-1.5 mm and used to capture a biotinylated antigen.

Signal producing system (“sps”) includes one or more components, at least one component being a label, which generate a detectable signal that relates to the amount of bound and/or unbound label, i.e. the amount of label bound or not bound to the compound being detected. The label is any molecule that produces or can be induced to produce a signal, such as a fluorescer, enzyme, chemiluminescer, or photosensitizer. Thus, the signal is detected and/or measured by detecting enzyme activity, luminescence, or light absorbance.

Suitable labels include, by way of illustration and not limitation, enzymes such as alkaline phosphatase, glucose-6-phosphate dehydrogenase (“G6PDH”) and horseradish peroxidase; ribozyme; a substrate for a replicase such as Q-beta replicase; promoters; dyes; fluorescers such as fluorescein, isothiocyanate, rhodamine compounds, phycoerythrin, phycocyanin, allophycocyanin, o-phthaldehyde, and fluorescamine; chemiluminescers such as isoluminol; sensitizers; coenzymes; enzyme substrates; photosensitizers; particles such as latex or carbon particles; suspendable particles; metal sol; crystallite; liposomes; cells, etc., which can be further labeled with a dye, catalyst, or other detectable group. Suitable enzymes and coenzymes are disclosed in Litman, et al., U.S. Pat. No. 4,275,149, columns 19-28, and Boguslaski, et al., U.S. Pat. No. 4,318,980, columns 10-14; suitable fluorescers and chemiluminescers are disclosed in Litman, et al., U.S. Pat. No. 4,275,149, at columns 30 and 31; which are incorporated herein by reference. Preferably, at least one sps member is selected from the group consisting of fluorescers, enzymes, chemiluminescers, photosensitizers, and suspendable particles.

The label can directly produce a signal, and therefore, additional components are not required to produce a signal. Numerous organic molecules, for example fluorescers, are able to absorb ultraviolet and visible light, where the light absorption transfers energy to these molecules and elevates them to an excited energy state. This absorbed energy is then dissipated by emission of light at a second wavelength. Other labels that directly produce a signal include radioactive isotopes and dyes.

Alternately, the label may need other components to produce a signal, and the sps can then include all the components required to produce a measurable signal, which can include substrates, coenzymes, enhancers, additional enzymes, substances that react with enzymatic products, catalysts, activators, cofactors, inhibitors, scavengers, metal ions, specific binding substance required for binding of signal generating substances, and the like. A detailed discussion of suitable signal producing systems can be found in Ullman, et al. U.S. Pat. No. 5,185,243, columns 11-13, which is incorporated herein by reference.

The label is bound to a specific binding pair (hereinafter “sbp”) member which is the antigen, or is capable of directly or indirectly binding the antigen, or is a receptor for the antigen, and includes, without limitation, the antigen; a ligand for a receptor bound to the antigen; a receptor for a ligand bound to the antigen; an antibody that binds the antigen; a receptor for an antibody that binds the antigen; a receptor for a molecule conjugated to an antibody to the antigen; an antigen surrogate capable of binding a receptor for the antigen; a ligand that binds the antigen, etc. Binding of the label to the sbp member can be accomplished by means of non-covalent bonding as for example by formation of a complex of the label with an antibody to the label or by means of covalent bonding as for example by chemical reactions which result in replacing a hydrogen atom of the label with a bond to the sbp member or can include a linking group between the label and the sbp member. Such methods of conjugation are well known in the art. See for example, Rubenstein, et al., U.S. Pat. No. 3,817,837, which is incorporated herein by reference. Other sps members can also be bound covalently to sbp members. For example, in Ullman, et al., U.S. Pat. No. 3,996,345, two sps members such as a fluorescer and quencher can be bound respectively to two sbp members that both bind the analyte, thus forming a fluorescer-sbp₁:analyte:sbp₂-quencher complex. Formation of the complex brings the fluorescer and quencher in close proximity, thus permitting the quencher to interact with the fluorescer to produce a signal. This is a fluorescent excitation transfer immunoassay. Another concept is described in Ullman, et al., EP 0,515,194 A2, which uses a chemiluminescent compound and a photosensitizer as the sps members. This is referred to as a luminescent Oxygen channeling immunoassay. Both the aforementioned references are incorporated herein by reference.

The analysis of mRNA expression in tumors does not necessarily reveal the status of protein levels in the cancer cells. Other factors such as protein half-life and mutation can be altered without an effect on mRNA levels thus masking significant molecular changes at the protein level. Serum antibody reactivity to cellular proteins occurs in cancer patients due to presentation of mutated forms of proteins from the tumor cells or overexpression of proteins in the tumor cells. The host immune system can direct individuals to molecular events critical to the genesis of the disease. Using a candidate gene approach, experience has shown that the frequency of serum positivity to any single protein is low. Therefore, to increase the identification of such autoantigens, a more global approach is employed to exploit immunoreactivity to identify large numbers of cDNAs coding for proteins that are mutated or upregulated in cancer cells.

In order to develop an effective screening test for early detection of ovarian cancer, cDNA phage display libraries are used to isolate cDNAs coding for epitopes reacting with antibodies present specifically in the sera of patients with ovarian cancer. The methods of the present invention detect various antibodies that are produced by patients in reaction to proteins overexpressed in their ovarian tumors. This is achievable by differential biopanning technology using human sera collected both from normal individuals and patients having ovarian cancer and phage display libraries expressing cDNAs of genes expressed in ovarian epithelial tumors and cell lines. Serum reactivity toward a cellular protein can occur because of the presentation to the immune system of a mutated form of the protein from the tumor cells or overexpression of the protein in the tumor cells. The strategy provides for the identification of epitope-bearing phage clones (phagotopes) displaying reactivity with antibodies present in sera of patients having ovarian cancer but not in control sera from unaffected women. This strategy leads to the identification of novel disease-related epitopes for diseases including, but not limited to ovarian cancer, that have prognostic/diagnostic value with additional potential for therapeutic vaccines and medical imaging reagents. This also creates a database that can be used to determine both the presence of disease and the stage of the disease.

The series of experiments disclosed herein provide direct evidence that biopanning a T7 coat protein fusion library can isolate epitopes for antibodies present in polyclonal sera. This also showed that the technology can be applied to direct microarray screening of large numbers of selected phage against numerous patient and control sera. This approach provides a large number of biomarkers for early detection of disease.

More specifically, the methods of the present invention provide four to five cycles of affinity selection and biopanning which are carried out with biological amplification of the phage after each biopanning, meaning growth of the biological vector of the cDNA expression clone in a biological host. Examples of biological amplification include but are not limited to growth of a lytic or lysogenic bacteriophage in host bacteria or transformation of bacterial host with selected DNA of the cDNA expression vector. The number of biopanning cycles generally determines the extent of the enrichment for phage that binds to the sera of patient with ovarian cancer. This strategy allows for one cycle of biopanning to be performed in a single day. Someone skilled in the art can establish different schedules of biopanning that provide the same essential features of the procedure described above.

Two biopanning experiments are performed with each library differentially selecting clones between control and disease patient sera. The first selection is to isolate phagotope clones that do not bind to control sera pooled from control women but do bind to a pool of disease patient serum. This set of phagotope clones represent epitopes that are indicative of the presence of disease as recognized by the host immune system. The second type of screening is performed to isolate phagotope clones that did not bind to a pool of control sera but do bind to an individual patient's serum. Those sets of phagotope clones represent epitopes that are indicative of the presence of disease.

Subsequent to the biopanning, the clones so isolated can be used to contact antibodies in sera by spotting the clones or peptide sequences of amino acids containing those encoded by the clones. After spotting on a solid support, the arrays are rinsed briefly in a 1% BSA/PBS to remove unbound phage, then transferred immediately to a 1% BSA/PBS blocking solution and allowed to sit for 1 hour at room temperature. The excess BSA is rinsed off from the slides using PBS. This step insures that the elution step of antibodies is more effective. The use of PBS elutes all of the antibodies without harming the binding of the antibody. Antibody detection of reaction with the clones or peptides on the array is carried out by labeling of the serum antibodies or through the use of a labeled secondary antibody that reacts with the patient's antibodies. A second control reaction to every spot allows for greater accuracy of the quantitation of reactivity and increases sensitivity of detection.

The slides are subsequently processed to quantify the reaction of each phagotopes. Such processing is specific to the label used. For instance, if fluorophore cy3-cy5 labels are used, this processing is done in a laser scanner that captures an image of the slide for each fluorophore used. Subsequent image processing familiar to those skilled in the art can provide intensity values for each phagotope.

The data analysis can be divided into the following steps:

1. Pre-processing and normalization.

2. Identifying the most informative markers

3. Building a predictor for molecular diagnosis of ovarian cancer and validating the results.

The purpose of the first step is to cleanse the data from artifacts and prepare it for the subsequent steps. Such artifacts are usually introduced in the laboratory and include: slide contamination, differential dye incorporation, scanning and image processing problems (e.g. different average intensities from one slide to another), imperfect spots due to imperfect arraying, washing, drying, etc. The purpose of the second step is to select the most informative phages that can be used for diagnostic purposes. The purpose of the third step is to develop a software classifier able to diagnose cancer based on the antibody reactivity values of the selected phages. The last step also includes the validation of this classifier and the assessment of its performance using various measures such as specificity, sensitivity, positive predictive value and negative predictive value. The computation of such measures can be done on cases not used during the design of the chip in order to assess the real-world performance of the diagnosis tool obtained.

The pre-processing and normalization step is used for arrays using two channels such as Cy5 for the human IgG and Cy3 for the T7 control, the spots are segmented and the mean intensity is calculated for each spot. A mean intensity value is calculated for the background, as well. A background corrected value is calculated by subtracting the background from the signal. If necessary, non-linear dye effects can be eliminated by performing an exponential normalization (Houts, 2000) and/or LOESS normalization of the data and/or a piecewise linear normalization (see FIGS. 7 A-D). The values coming from each channel are subsequently divided by their mean of the intensities over the whole array. Subsequently, the ratio between the IgG and the T7 channels was calculated. The values coming from replicate spots (spots printed in quadruplicates) are combined by calculating mean and standard deviation. Outliers (outside +/−two standard deviations) are flagged for manual inspection). Single channel arrays are pre-processed in a similar way but without taking the ratios. This preprocessing sequence was shown to provide good results for all preliminary data analyzed.

The step of selecting the most informative markers is used to identify the most informative phages out of the large set of phages started with. The better the selection, the better is the expected accuracy of the diagnosis tool.

A first test is necessary to determine whether a specific epitope is suitable for inclusion in the final set to be spotted. The selection methods to be applied follow the principles of the methods successfully applied in (Golub et al., 1999; Alizadeh et al., 2000) and can be briefly described in the following.

Procedure 1

The procedure is initiated defining a template for the cancer case (FIG. 8). Unlike gene expression experiments where the expression level of a gene can be either up or down in cancer vs. healthy subjects, here one is testing for the presence of antibodies specific to cancer were tested for. Therefore, epitopes with high reactivity in controls and low reactivity in patients are not expected and the profile to the left in FIG. 8 is sufficient. Each epitope can have a profile across the given set of patients (FIGS. 9A and B). The profile of each epitope is compared with the templates using a correlation-based distance. Those skilled in the art can recognize that the other distances may be used without essentially changing the procedure.

The epitopes are then ordered based on the similarity between the reference profile (FIG. 8) and their actual profile. FIG. 7 shows 46 epitopes found informative for a correlation threshold of 0.8. The final cutoff threshold is calculated by doing 1000 random permutations once the whole data set become available. Each such permutation moves randomly the subjects between the ‘patient’ and ‘control’ categories. Calculating the score of each epitope profile for such permutations allows us to establish a suitable threshold for the similarity (Golub et. al. 1999).

The technique follows closely the one used in (Golub, 1999). However, the technique can be further improved as follows. Firstly, this technique was shown to provide good results if most controls are consistent by providing the same type of reactivity. However, preliminary data showed that there are control subjects that show a non-specific reactivity with all clones (see FIG. 1b). While still clearly different from patients. FIG. 8 shows a comparison between the histogram profile of a control subject showing a non-specific reaction (19218) with and the profile of a patient (19223). FIG. 9 shows the scatterplots of the same subjects. While still clearly different from patients, such control subjects with a high non-specific reaction introduces spikes in the clone profile in the area corresponding to the control subjects (right left hand side of the template in FIG. 8). Such spikes decrease the score of the relevant clones making them more difficult to distinguish from the irrelevant ones. In order to reduce this effect, all control subjects with a non-specific response (i.e a unimodal distribution such as in the left panel of FIG. 7) were eliminated from the analysis leading to the epitope selection.

A second essential modification is related to the set of epitopes selected. There are rare patients who might react only to a small number of very specific epitopes. If the selection of the epitopes is done on statistical grounds alone, such very specific epitopes can be missed if the set of patients available contains only few such rare patients. In order to maximize the sensitivity of the penultimate test resulted from this work, every effort was made to include epitopes which might be the only ones reacting to rare patients. In order to do this, the information content of the set of epitopes is maximized while trying to minimize the number of epitopes used using the following procedure.

Procedure 2

Assume there are m patients and k controls. Select n random patients from the m available. For each of the n patients used for epitope selection, amplify (n×4 biopannings) and do self-reactions. Eliminate those patients/epitopes that do not react to self.

Make a chip with all available, self-reacting epitopes printed in quadruplicates. React this chip with all patients and controls (n+k antibody reactions). Eliminate controls with a non-specific reactivity. For the set of epitopes coming from a single patient, apply Procedure 1 to order the epitopes in the order of their informational content and select the ones that can be used to differentiate patients from controls.

Order the epitopes by their reactivity in decreasing order of the number of patients they react to. Scan this list from the top down, moving epitopes from this list to the final set. Every time a set of epitopes coming from a patient x is added to the final set, the patient x and all other patients that these epitopes react to are represented in the current set of epitopes. Repeat until all patients are represented in the current set of epitopes.

This procedure tries to minimize the number of epitopes used while maximizing the number of patients that react to the chip containing the selected epitopes.

The following example shows how this procedure works using a simple example. The matrix in FIG. 10 contains a row i for the clones coming from patient i and a column j for the serum coming from patient j. A serum is said to react specifically with a set of clones if the histogram of the ratios is bimodal (see subject 19218 in FIGS. 8 and 9). A serum is said to react non-specifically if the histogram of the ratio is unimodal (see subject 19223 in FIGS. 8 and 9). Furthermore, a serum might not react at all with a set of clones. If the serum from patient j reacts specifically with the clones from patient i, the matrix can contain a value of 1 at the position (i, j). The element at position (i, j) is left blank if the there is no reaction or the reaction is non-specific.

Each set of epitopes corresponding to a row of the matrix is pruned by sub-selecting epitopes according to Procedure 1. The rows are now sorted in decreasing reactivity (number of patients other than self that the clones react to). For instance, in FIG. 11, the clones from patient 2 react with sera from self (column 2) and patients 4 and 8. The clones from patient 3 react with sera from self (column 3) and patients 6 and 10, etc. The final set of clones was obtained from patients 2, 3, 5, 7 and 1 (reading top-down in column 1). Clones coming from patients 8, 9 and 10 are not included since these patients already react to clones coming from other patients. This set ensures that the chip made with these clones reacts with all patients in this example.

Procedure 3

Arrays using two channels such as Cy5 for the human IgG and Cy3 for the T7 control are processed as follows. The spots are segmented and the mean intensity is calculated for each spot. A mean intensity value is calculated for the background, as well. A background corrected value is calculated by subtracting the background from the signal. The values coming from each channel are normalized by dividing by their mean. Subsequently, the ratio between the IgG and the T7 channels are calculated and a logarithmic function is applied. The values coming from replicate spots (spots printed in quadruplicates) are combined by calculating mean and standard deviation. Outliers (outside +/−two standard deviations) are flagged for manual inspection. Someone skilled in the art can recognize that various combinations and permutations of the steps above or similar could replace the normalization procedure above without substantially changing rest of the data analysis process. Such similar steps include without limitation taking the median instead of the mean, using logarithmic functions in various bases, etc.

The histogram of the average log ratio is calculated. If the histogram is unimodal (e.g subject 19223 in FIG. 7), there is no specific response. If the histogram is clearly bimodal (e.g. subject 19218 in FIG. 7), there is a specific response. All 25 subjects analyzed so far fell in one of these two categories or had no response at all. A mixed probability model is used in less clear cases to fit two normal distributions as in (Lee, 2000). If the two distributions found under the maximum likelihood assumption are separated by a distance d of more than 2 standard deviations (corresponding to a p-value of approximately 0.05), there is a specific response. If the distance is less than 2 standard deviations, the response can be considered as not specific. The preliminary data analyzed so far showed a very good separation of the distributions for the patients.

Once the chosen clones are spotted on the final version of the array, a number of sera coming from both patients and controls can be tested. These sera come from subjects not used in any of the phases that lead to the fabrication of the array (i.e. not involved in clone selection, not used as controls, etc.). Each test was evaluated using Procedure 3 above. The performance on this validation data can be reported in terms of PPV, NPV, specificity and sensitivity. Since these performance indicators are calculated on data not previously used, they provide a good indication of the performance of the test for screening purposes for the various categories of patients envisage in the general population.

The present invention also provides a kit including all of the technology for performing the above analysis. This is included in a container of a size sufficient to hold all of the required pieces for analyzing sera, as well as a digital medium such as a floppy disk or CDROM containing the software necessary to interpret the results of the analysis. These components include the array of clones or peptides spotted onto a solid support, prewashing buffers, a detection reagent for identifying reactivity of the patients' serum antibodies to the spotted clones or peptides, post-reaction washing buffers, primary and secondary antibodies to quantify reactivity of the patients serum antibodies with the spotted array and methods to analyze the reactivity so as to establish an interpretation of the serum reactivity.

A biochip for detecting the presence of the disease state in a patient's sera is provided by the present invention. The biochip has a detector contained within the biochip for detecting antibodies in a patient's sera. This allows a patient's sera to be tested for the presence of a multitude of diseases or reaction to disease markers using a single sample and the analysis can be conducted and analyzed on a single chip. By utilizing such a chip this lowers the time required for the detection of disease while also enabling a doctor to determine the level of disease spread or infection. The chip, or other informatics system can be altered to weigh the results. In other words, the informatics can be altered to adjust the levels of sensitivity and/or specificity of the chip.

The present invention is well suited for providing useful information regarding the efficacy of pharmaceuticals at treating disease. Specifically, the present invention is well suited in measuring the effects of drugs and other medications based on the above-identified markers. The present invention determines the minimum level of a pharmaceutical needed to achieve therapeutic benefits. Thus, the present invention is useful in determining effective treatment of various diseases and illnesses. The results of the analysis can be utilized to determine if the treatment is effective or if such treatment needs to be altered.

Further, the treatment can be altered based upon the markers detected. For example, the treatment can be specifically designed based upon the markers identified. In other words, the therapy can be altered to most suitably treat the identified markers, such that the treatment is designed to most efficiently treat the identified marker. The ability to adjust the therapy enables the treatment to be tailored to the person being treated's needs. The treatments that can be used range from vaccines to chemotherapy.

The markers of the present invention can also be used for immuno-imaging. Immuno-imaging is a process in which antibodies to a specific antigen are labeled such that the label can be detected externally. Examples of externals detectors include, but are not limited to, x-rays, MRI, CT scan, and PET scans. The imaging functions because an imaging reagent containing the labeled antibody is administered to a patient.

The above discussion provides a factual basis for the use of the combination of markers and method of making the combination. The methods used with a utility of the present invention can be shown by the following non-limiting examples and accompanying figures.

EXAMPLES
Example 1

The purpose of this study is to clone epitopes that are recognized by sera from women with ovarian cancer but not recognized by normal sera from unaffected women. As these epitopes are cloned, protein array assays are developed capable of detecting ovarian cancer at an early stage by analyzing antigens recognized in the sera of at risk women. Toward this end, individual sera were screened using these protein biochips to determine the antibody reactivity to each protein epitope. Antibody reactivity is detected that does not appear in control sera. The patients and control sera obtained for this study were used to calibrate the protein biochips and identify the most informative epitope-clones. The women were monitored for the appearance or reappearance of antibody reactivity and its correlation with tumor burden. By following the serum reactivity to tumor reactive new epitopes on the arrays of the phage display cDNA clones, the analysis of sera from women after their initial diagnosis and semiannually thereafter allows the determination of the markers in predicting tumor recurrence.

Some of the markers can be predictive of recurrence, and thus can be used to correlate specific ovarian tumor types (using the World Health Organization Histological Classification of Ovarian Tumors), also the tumor grade (where appropriate, since not all tumors all graded), and the surgical stage. This can be done by review of the pathological material (glass slides, patient records, and surgical pathology reports). Certain currently accepted biomarkers of research interest such as Her-2 neu and other can also be included in the new protein biochips in order to compare the sensitivity and specificity of the new and existing immunohistochemical technologies. Testing for Her-2 neu and other biological markers is done by the immunoperoxidase method using formalin fixed, paraffin embedded tumor tissues.

For the purpose of comparison to the ovarian cancer patients, one can analyze serum markers in women in good health who do not have ovarian or any other type of cancer. These control subjects should not have a family history of ovarian cancer or breast cancer. Because some serum markers such as CA125 levels are increased in endometriosis, uterine leiomyoma, pelvic inflammatory disease, early pregnancy, and benign cysts, control subjects should be free of these conditions as well.

The series of experiments provides direct evidence that biopanning a T7 coat protein fusion library can isolate epitopes for antibodies present in polyclonal sera. This also showed that the technology can be applied to direct microarray screening of large numbers of the selected phage against numerous patient and control sera. This approach provides a large number of biomarkers for early detection of ovarian cancer. The likelihood of success of this approach is increased by the fact that the mRNA for human Sirt2 is present in cells at very low abundance in human brain RNA thus indicating that clones can be isolated for rare RNA transcripts by this approach.

To further demonstrate the feasibility of these methods for differential detection of epitopes between test and control sera, four cycles of biopanning of a commercial Novagen breast tumor cDNA library were performed using a serum sample from a breast cancer patient and a control serum sample from a woman without cancer. 100 plaques were picked from each biopanning. Analysis of 100 plaques from the initial library and each successive biopanning were amplified in microtitre plates and the lysates cleared by centrifugation. One half microliter of each sample was spotted onto nitrocellulose filters and immunodetection performed using the breast cancer patient serum at 1:20,000 dilution (FIG. 5). Clear enrichment during biopanning is seen as was observed above with the anti-Sirt2 rabbit serum. As seen in FIG. 6 (using randomly picked plaques from BP 4) the filters contacted with the control serum on the left panels demonstrate weaker spot intensity as compared to a duplicate filter of the same clones on the right that was contacted with the patient serum. Approximately 65% of the phage selected for reactivity to the patient's serum were more than 3-fold more reactive with the patient's serum than with the control serum as determined by scanning densitometry.

FIG. 6A shows a comparison of serum reaction of control and breast cancer patient with phagotopes from BP4. FIG. 6B shows the BP4 filters that were scanned and the ratio of the pixel densities plotted in rank order.

This experiment demonstrates that one can differentially detect the epitopes for which the process is selecting, i.e. those bound to protein G-agarose beads in association with antibodies in the patient's serum and not the control serum. Someone skilled in the art can recognize that other solid supports for biopanning could replace the protein-G beads without substantively changing the biopanning process. These data also indicate that the selection is imperfect. Not all of the selected phagotopes are more reactive with the patient's serum that the control serum. Therefore, the identification of the most informative phagotopes requires analysis of the reactivity with multiple, individual patients' sera tested at various serum dilutions.

The immune reactivity to human tumors recognizes changes in the expression levels and mutation status of proteins in the tumor cells. These types of immunological reactivity are not observed in sera from control subjects. The antibody titer to tumor specific epitopes can be proportional to the tumor burden. The immune reactivity to human tumors can be used diagnostically and prognostically to predict the presence and behavior of human tumors such as tumor recurrence. Serum reactivity to single proteins tends to incompletely identify tumor bearing patients and therefore more robust methods are necessary to accurately identify tumor occurrence and recurrence. Whole genome-based proteomics such as the technology and data analysis methods embodied in the application can more comprehensively identify those proteins recognized by the host immune system.

Those of skill in the art are familiar with the construction of cDNA libraries and there are numerous published numerous papers on isolation of cDNAs from human cells in culture using this technology (Chiao, et al., 1992; shin et al., 1993; Buettner et al., 1993; Kim et al., 1996; Deyo et al., 1998; Bauer et al 1998). cDNA libraries can be prepared from ovarian cancer cell lines or from ovarian tumor tissue. Tumor tissue cDNA library can be prepared from a pool of mRNA preparations from each of the different stages of cancer to increase the diversity of clones in the library.

Methods

mRNA from one ovarian cancer cell line, SKOV3 and ovarian tumor tissues, was copied into cDNA and libraries prepared. Tumor tissue in excess of that needed for pathological evaluation was obtained by informed consent from ovarian cancer patients.

Sera was obtained from 1) ovarian cancer patients at the time of diagnosis and at six month intervals during the follow up physician visits; 2) unaffected women for control sera.

T7 cDNA phage display expression libraries are prepared for biopanning experiments, to select phage bearing epitopes ie phagotopes that are recognized by sera from women with ovarian cancer but not recognized by normal sera from unaffected women. For the biopanning process, sera from women in the control group was pooled to avoid individual variations unrelated to the presence of ovarian cancer.

The selection of the most informative epitopes was done by comparing the immune reaction profile of each individual epitope with templates defined for each disease stage. Several distances and information entropy measures were used. Several predictors were constructed based on three selected machine learning techniques using only a part of the available data. Specificity, sensitivity, positive predicted value and negative predicted value were calculated for each such classifier. The validation of the predictors and the selection of the best predictor was done by cross-validation on cases that have not been used during the predictor construction.

For example, to develop an effective screening test for early detection of ovarian cancer, cDNA phage display libraries were used to isolate cDNAs coding for epitopes reacting with antibodies present in the sera of patients with ovarian cancer. Screening of T7 phage cDNA library with serum containing polyclonal antibodies against a known protein, leads to the enrichment of one particular phage clone (which displays the peptide sequence recognized by the antibody on its coat) after several rounds of biopanning. Serum containing polyclonal antibodies were raised against a C-terminal 12 amino acid peptide from the human homologue of the yeast SIRT2 protein and screened against a T7 phage human brain cDNA library. This library was used because the Sirt2 transcript is expressed in human brain. Preimmune rabbit serum was bound to protein-G agarose beads and 6×10¹⁰phage were added to the beads. The unbound phage were then bound to protein-G agarose beads to which the Sirt2p antibody was previously bound. The nonspecifically bound phage were washed away with PBS and the specifically bound phage eluted with 1% SDS. T7 phage is stable in this solution. These phage are diluted to reduce the SDS concentration and used to infect bacteria for amplification and another cycle of biopanning. Table 1 shows the value of the titer of the T7 phage library after each cycle of biopanning. This table reveals that the titer of the eluate after each round of biopanning increased with each successive cycle of antibody selection.

E. coli BLT5615 infected with amplified phage library after biopanning 1-4 were plated onto LB-Agar plates and plaque lifts were performed for all the individual plates. The plaque lift filter membranes were then hybridized with a P³²-labeled Sirt2 cDNA probe. The percentage of positive plaques (number of positive plaques/total number of plaques×100) as determined for each plates labeled BP1-4, FIG. 1 increased with each successive cycle of biopanning. For BP1 and BP2 the percentage of positive plaques was negligible. For BP3 and BP4, percentage of positive plaques was 1.7% and 8.6% respectively.

In order to confirm that those positive plaques contain phage clones displaying the peptide sequence of Sirt2, 50 plaques were randomly picked up and PCR amplified each insert using T7 coat protein forward primer (5′TCTTCGCCCAGAAGCTGCAG3′) (SEQ ID NO: 1)) and T7 coat protein reverse primer (5′CCTCCTTTCAGCAAAAAACCCC3′) (SEQ ID NO: 2)). Filter hybridization was performed using the same Sirt2 cDNA probe as above. As shown in FIG. 2, 7 out of 50 plaques (14%) hybridized to the Sirt2 probe, a frequency similar to that observed in the plaque lifts. Plaques positively reacting with the Sirt2 probe were picked and also hybridized on Southern Blots of PCR product.

Sirt2 positive plaques (upper two rows) and Sirt2-negative plaques (lower two rows) were chosen and 1 μl (pfu indicated at left) of each amplified phage clone was spotted onto the nitrocellulose membranes which were then treated as if they were standard immunoblots using the rabbit polyclonal Sirt2 antibody (right panel) or a mouse monoclonal antibody to the T7 capsid protein (left panel). The rabbit polyclonal antibody provides a sample for testing as if it were a patient's serum using the Sirt2 protein as a model. The Sirt2 antibody in the rabbit polyclonal serum reacted specifically with the Sirt2 phage. The identity of the phage was confirmed by direct PCR sequence analysis of the cDNA inserts in two independent Sirt2 positive phage. Thus phage expressing the epitope to which the antiserum was directed were isolated and distinguished from other phage.

Microarrays were spotted using Sirt2 T7 clones and other T7 clones that do not express Sirt2. These arrays were used to analyze a mixture of Cy5-labeled (red) rabbit Sirt2-immunized serum and Cy3-labeled (green) T7 coat protein antibody (Novagen) added to the pre-immune rabbit serum. The scanned two-color image clearly shows specific detection of the Sirt2-expressing T7 clones by the anti-Sirt2 antibody. The Sirt2 expressing clones appear yellow because they bind both the red-labeled antibody to a rabbit immunoglobulin G protein and the green-labeled anti-T7 capsid 10B antibody. The non-Sirt2-expressing T7 clone are green as they only bind to the Cy3-labeled anti-T7 antibody. This development of detection of protein epitopes in bacteriophage bodes well for the applicability of phage arrays to the detection of low abundance species and weak binders. The spots in the image are approximately 100 microns in diameter.

The following is an example of the preparation of a tumor reactive cDNA expression library: Ovarian cancer cells were grown in monolayer culture. Cells or fresh tumors from patients were lysed by the addition of 3 ml of TRIZOL reagent and the homogenized sample was incubated for five minutes at room temperature. Chloroform, 0.6 ml, was added and the mixture was shaken vigorously for 15 seconds and then incubated at room temperature for 2-3 minutes. The extract was centrifuged at 12,000×g for 30 minutes at 4° C. Following centrifugation, the mixture was separated a lower red, phenolchloroform phase, an interphase, and a colorless aqueous phase. Aqueous phase was transferred to a fresh tube and total RNA was precipitated by adding 1.5 ml of isopropanol. The mixture was incubated at room temperature for ten minutes and was centrifuged at 12,000 g for 30 minutes at 4° C. The supernatant was discarded and the RNA pellets were washed by adding 3 ml of 75% ethanol. The samples were centrifuged at 14,000×g for 15 minutes. The RNA pellet was air-dried and was dissolved in RNase-free water.

mRNA was isolated from total RNA following Oligotex mRNA spin column protocol. Total RNA, 0.5 mg, was dissolved in 500 μl of RNase-free water and 500 μl of binding buffer and 30 μl of Oligotex suspension was added. The contents were mixed thoroughly, incubated for three minutes at 70° C. in a water-bath, and then at room temperature for 10 minutes. The Oligotex:mRNA complex was pelleted by centrifugation for 2 minutes at 14,000×g and the supernatant was discarded. The Oligotex:mRNA pellet was resuspended in 400 μl washing buffer by vortexing and pipetted onto a spin column placed in a 1.5 ml microcentrifuge tube. The samples were centrifuged at maximum speed for one minute and the flow-through discarded. The spin column was transferred to a new RNase-free 1.5 ml microcentrifuge tube. Elution buffer at 70° C. was then added to the column. Poly (A)⁺ mRNA was eluted, quantitated by UV spectroscopy and the process of poly A selection repeated one more time to further reduce contamination with ribosomal RNA. Twice poly A selected mRNA was stored at −70° C. for use in library preparation.

Novagen's OrientExpress cDNA Synthesis and Cloning systems were used for the construction of ovarian cancer cDNA T7 phage libraries. For first-strand cDNA synthesis, OrientExpress Random Primer System was used to ensure representation of both N-terminal and C-terminal amino acid sequences.

Ten ml of LB/carbenicilln medium were inoculated with a single colony of BLT5615 from a freshly streaked plate. The mixture was shaken at 37° C. overnight. Ten ml of the overnight culture was added to 90 ml of LB/carbenicillin medium and was allowed to grow until OD₆₀₀reaches 0.4-0.5.IPTG (1 mM), M9 salts (1×) and glucose (0.4%) can be added and the cells were allowed to grow for 20 minutes. An appropriate volume of culture was infected with phage library at MOI of 0.001-0.01 (100-1000 cells for each pfu). The infected bacteria were incubated with shaking at 37° C. for one to two hours until lysis is observed. Glycerol (0.02%), PMSF (0.02M) was added to the cell lysate to block proteolysis of the capsid fusion proteins. The phage were centrifuged at 8000×g for 10 minutes. The supernatant was collected and was stored at 4° C. The lysate was titered by plaque assay under standard conditions. The libraries are stored after purification by polyethylene-glycol precipitation and ultracentrifugation through a stepwise CsCl gradient.

Using this approach, applicants have constructed the first library. Using twice poly A selected mRNA from SKOV3 cells a T7 select cDNA library was prepared containing 1.8×10⁷initial plaques after packaging. This representation is comparable to the clonal representation of the commercial libraries purchased. This library has been amplified and stored in aliquots in two −70° C. freezers.

Patients' sera were obtained from multiple institutions for this project. Three outside institutions have agreed to provide ovarian cancer patient sera and the associated medical record information in anonymized form. Dr. Steven Witkin from the Weill Medical College of Cornell University provided 46 patient serum samples and 27 controls. Dr. Karen Lu from the M.D. Anderson Cancer Center can provide 60 serum samples. Dr. David Fishman from the Northwestern University Comprehensive Cancer Center provided 35 serum samples of patients who have been followed from time of diagnosis.

The ideal sera for the clone biopanning studies come from women just before or after surgery and prior to chemotherapy. Follow up sera were obtained after chemotherapy and are important to determine whether the penultimate protein array technology can detect tumor recurrence.

In addition, a supply of tumor tissue was required for the preparation of mRNA for cDNA library production and gene expression studies using samples from patients. This tissue was harvested within 20 minutes of surgical excision from the patients. This requires the coordinated effort of the gynecologic surgeons and pathologists. Patients at the time of their original surgery or prior to chemotherapy were accrued for serum collection. If tumor tissue is available in excess of that needed for routine pathologic evaluation, that tissue was used for RNA preparation for mRNA expression studies associated with this study. Sections from tissue blocks were also acquired for the purpose of expression studies of proteins in the patients' tumors. Patients at follow up visits to the OB/GYN clinics were also subjects for serum acquisition. These latter patients can be at a time of recurrence or not. This allows the observation of the reappearance of serum markers in the event of tumor recurrence. Serum was obtained from eligible patient-subjects during scheduled clinic visits. The initial serum acquisition occurs prior to surgery, if possible, or if post surgery, prior to chemotherapy. A single red top 7 cc vial of blood was obtained during normal phlebotomy and the serum isolated after clotting. Serum continues to be collected from these patients during follow up visits for up to five years or until ovarian cancer recurrence. Tumor tissue in excess of that required for pathological analyses were acquired at the time of surgery for the preparation of tumor RNA needed for antibody screening. Unaffected volunteers (controls) were be recruited through community outreach activities.

The Biopanning Process

Steps in the Biopanning Process:

Affinity selection with sera from normal individuals: Twenty-five μl of Protein G Plus-agarose beads were taken in 0.6 μl eppendorf tube and were washed two times with 1×PBS. Washed beads were blocked with 1% BSA at 4° C. for one hour. The beads were then incubated at 4° C. for one hour with 250 μl of pooled sera at a dilution 1:20 from 20 control women. After three hours of incubation, beads were washed three times with 1×PBS and then incubated with phage library (˜10¹⁰phage particles). After incubation, the mixture was centrifuged at 3000 rpm for two minutes to remove phage nonspecifically bound to the beads and the supernatant (phage library) was collected for immunoscreening.

Fresh protein G Plus agarose beads were placed into a 0.6 ml eppendorf tube and were washed two times with 1×PBS. Washed beads were blocked with 1% BSA at 4° C. for one hour. The beads were then incubated at 4° C. for three hours with 250 μl of sera at a dilution 1:20 from patients with ovarian cancer. After this incubation, the beads were washed three times with 1×PBS and were incubated with phage library supernatant from above (termed as Biopanning 1 (BP1)) collected for immunoscreening at 4° C. for overnight (shorter times of incubation have not proven successful using model antibody systems). After incubation, the mixture was centrifuged at 3000 rpm for two minutes and supernatant can be discarded. Beads were washed three times with 1×PBS. To elute the bound phage 1% SDS was added to the washed beads and the mixture was incubated at room temperature for ten minutes. The bound phage were removed from the beads by centrifugation at 8000 rpm for seven minutes. Eluted phage were transferred to liquid culture for amplification (100 μl elution to 20 ml culture). Four rounds of affinity selection and immunoscreening was carried out with amplified phage obtained after each biopanning. The number of biopanning cycles generally determines the extent of the enrichment for phage that binds to the sera of patient with ovarian cancer. This process allows for one cycle of biopanning to be performed in a single day.

In the past serum markers have been identified using SEREX technology that detected only a few gene products at a time. The biopanning approach developed can isolate large numbers of target epitopes. These epitopes are displayed on the surface of bacteriophage as in-frame fusion proteins with the T7 phage capsid protein and can be analyzed in large numbers by arraying the selected phage on filter paper or glass slides (protein microarrays). The method isolates large numbers of phage that react with antibodies from pooled patient sera but not with normal sera.

The titer of the T7 phage library obtained after amplification of each Biopanning (BP1-4) eluate was determined by plaque assay. E. Coli BLT 5616 were infected with the primary unamplified phage from biopanning (BP3-4) and plagued to limiting dilution onto LB/carbenicillin plates (150 mm×15 mm petri dish) so that sufficient numbers of single plaques can be isolated to obtain 12×96 well plates for arraying. The plates were incubated at 37° C. for 3-4 hours until the plaques are visible and then picked for amplification in the 12×96 well plates. After two hours, lysis of the host bacteria occurs in the wells of the 96-well plates. One well of each plate was uninfected as a control. Five 96 well plates of 200 μl phage lysates are clarified by centrifugation of the phage. The phage were cleared by whole plate centrifugation before robotic spotting in triplicate onto filters or glass slides. Excess reactivity in the surface area of the slide not spotted with phage is blocked using BSA, 1% solution in PBS for 60 minutes, followed by washing in water three times. After blocking the arrays on glass slides or filters were blocked with 1% BSA in PBS and incubated with a various dilutions of each of the individual controls and patient's sera spotted in triplicate or more for each dilution of serum. Serum antibodies binding to recombinant proteins expressed in the surface of the T7 bacteriophage were detected by incubation a Cy5-labeled anti-human IgG goat antiserum and visualized and quantified using GenePix and ImaGene software in a 4000B array scanner (AXON Instrument). As positive control for each spot a Cy3-labeled antibody for the T7 capsid protein was used. The ratio of the fluorescence intensity for the human antibodies were normalized to the T7 capsid antibody reactivity. Initial testing of phage solutions were performed on a spotting robot.

The optimal number of subtractive biopannings for each serum sample is determined by picking individual phage clones, and then testing the antibody reactivity for the serum used in the biopanning against those clones, (referred to as its self reaction). Plates of 96 clones were picked for each patient's biopanning at cycles 3, 4, and 5 which were then tested for the binding of the phage clones to antibodies in that serum, in a “self-reaction”. Antibody binding is detected by spotting the filters with a 96 pin head on a Biomek robot or detected on glass slides of microarrays of phagotopes. The filters are then treated like a western blot by blocking with 1% dry milk powder in PBS and adding diluted serum. After rocking for 2 hours the filter is washed and reacted with an anti-human IgG antibody link to horseradish peroxidase (HRP) and detected by ECL. From the clones isolated from one patient, (designated patient #1) a total of 480 plaques were picked from that serum at biopanning 4. Biopanning four was chosen because about 35% of the clones bound antibodies from that patient's serum. Serum reactivity of the phagotopes with the patient's serum was detected at a 1:10,000 dilution indicating a very high titer of the IgG molecules that react with the epitopes (self reaction with 480 clones). Reactivity to these clones is detected at similar dilutions using the clones arrayed on glass slides as an alternative solid support.

When the serum reactivity with other patients (non-self reactions) was analyzed using replicates of the robotically spotted filters, reactivity was found in some patients again at a dilution of 1:10,000 (FIG. 1b). Other patients required a 1:3000 dilution of the serum for detection of the reactive clones Table 1). Patients #23 reacted quite strongly while patient #16 reacted more weakly (FIG. 1b and Table 1). Positivity was scored only when 3 out of 3 of the triplicates have similar intensity. In the subtractive biopanning scheme plaques binding to normal serum proteins nonspecifically were removed by loading protein-G beads with a pool of control sera. One can detect positive reaction on filters spotted with phage epitope clones on filter 13 of 21 other patients using 153 reactive clones of the original 480 clones. Filters were tested with control sera not used in the initial subtractive step, and 5 of the 8 controls showed no reaction to the 480 phage on the filter arrays while a non-specific and even pattern of reactivity to all clones (without the typical triplicate pattern) was observed using 3 of the 8 different control sera (Table 1).

TABLE 2

# of phage Patient #1 BP4 clones reacted with

each patient's sera at indicated dilution

Patient's sera
1: 10000
1:3000

PATIENT 1
153
(self reaction)

PATIENT 2
None
142

PATIENT 16
NS

PATIENT 20
70

PATIENT 23
137

PATIENT 29
NS

PATIENT 30
NS

PATIENT 33
NS

PATIENT 35
NS
72

PATIENT 37
None
120

PATIENT 01-056
NS

PATIENT 01-060
None
61

PATIENT 00-007
NS

PATIENT 01-108
NS

PATIENT 01-045
NS

PATIENT 42501
40

PATIENT 400162
120

PATIENT 40036
Mostly NS

PATIENT 42780
85

PATIENT B755
NS

PATIENT 40015
NS

PATIENT 075
119

PATIENT 015
155

PATIENT 035
NS

PATIENT 007
114

PATIENT 005
133

PATIENT 083
150

PATIENT 054
92

PATIENT 064
NS

PATIENT 065
NS

NS indicates Non-Specific reaction only:

None indicates No reaction detected.

The filter arrays are incubated with a patient's serum (pretreated with 150 μg of bacterial extract to block nonspecific reactions with E. coli proteins for 2 hours at 4° C.) at various dilutions for 1 hour at room temperature. Bacterial extracts are used because some patients have antibodies to bacterial protein, and therefore pre-treatment with extracts of E. coli proteins blocks the nonspecific antibodies to bacterial protein present in the patient's serum. The membranes are then washed three times with TBST (0.24% Tris, 0.8% NaCl, and 1% Tween-20) for 15 minutes each. After washing is completed, the membranes are incubated with secondary antibody, goat-anti human IgG-HRP conjugated (Pierce) at 1:5000 dilution for 1 hour at room temperature. The membranes are again washed three times with TBST 15 minutes each. Finally, membranes are developed with Supersignal West Pico chemiluminescent substrate (Pierce) and the images were captured on a Kodak film.

Phagotope Microarrays on Glass Biochips Preparation of arrays Phage lysates are prepared as above. Phage lysates (usually five 96 well plates) from BP4 are transferred to 384-well plates, each lysate spotted in quadruplicate, using 10 μl per well. A robotic microarrayer is used to spot the phage in an ordered array onto FAST™ slides (Schleicher & Schuell) at a 350 μm spacing using 4 steel Micro-Spotting Pins. The arrays are dried overnight at room temperature.

Preparation of fluorescent antibody probes T7 monoclonal antibody and goat anti-human IgG are purchased from Novagen and Pierce respectively. Monofunctional NHS-ester activated Cy3 and Cy5 dyes are purchased from Amersham (PA33001 and PA35001). The antibodies are labeled in pH 8.0 sodium carbonate buffer as per the instructions from the manufacturer. Briefly, 100 μl of the protein solution with 5 μl of coupling buffer is transferred to the vial of reactive dye and mixed thoroughly. The reaction is incubated in the dark at room temperature for 30 minutes with additional mixing approximately every 10 minutes. The reaction solutions are then loaded into the gel filtration columns to separate the labeled protein from non-conjugated dye. T7 antibody is labeled by Cy3 and anti-human IgG is labeled by Cy5, respectively. The labeled protein is eluted and stored at 4° C. for future use. Reversing the dye-labeling scheme of the antibodies does not affect the results. The advantage of this strategy is that the same reagents were used on every phagotope array and the only variable is the patient's serum and therefore variations in labeling efficiency are not a factor.

Detection of fluorescent antibody probes The arrays are rinsed briefly in a 1% BSA/PBS to remove unbound phage, transferred immediately to 1% BSA/PBS as a blocking solution, and then incubated in this blocking solution for 1 hour at room temperature. The excess BSA is rinsed off from the slides using PBS. Without allowing the array to dry, 2 ml of PBS containing human serum at a dilution of 1:10,000 is applied to the surface in a screw-top slide hybridization tube. Multiple dilutions are tested per patient to obtain optimal detection. The arrays are incubated at room temperature for 1 hour with mixing. The arrays are rinsed in PBS to remove the serum, and then washed gently three times in PBS/0.1% Tween-20 solution 10 minutes each. All washes are performed at room temperature. After removing Tween-20 using PBS, the arrays are incubated with 2 ml of PBS containing Cy3-labeled-T7 anti-capsid antibody at a dilution of 1:50,000 and anti-human IgG labeled with Cy5 at a dilution of 1:10,000 as probes for 1 hour in the dark. The incubation solution is mixed every 20 minutes. Three washes are performed using PBS/0.1% Tween-20 solution with 10 minutes each. The array is then rinsed with filtered ddH₂O twice and dried using a stream of compressed air.

Analysis Phagotope Microarrays The arrays are scanned in an Axon Laboratories scanner (Axon Laboratories, Palo Alto, Calif.) using 532 nm and 635 nm lasers. The ratio of anti-T7 capsid and anti-human IgG is determined by comparing the fluorescence intensities in the Cy3- and Cy5-specific channels at each spot. The location of each spot on the array is outlined using the image processing software. The background, calculated as the median of pixel intensities from the local area around each spot, is subtracted from the average pixel intensity within each spot. This normalized reactivity is entered into a database for analysis.

The information in this database can be analyzed in order to: i) select the most informative epitopes and ii) develop into a diagnostic test for tumor occurrence in high-risk women or tumor recurrence in women previously treated for ovarian cancer. The gene products thusly identified can provide insight into molecular changes recognized by the host immune system.

The human antibodies reacting at each spot are detected with Cy5-labeled human serum antibodies. The normalization of the fluorescence at each spot is compared to a reaction with a Cy3-labeled antibody to the T7 phage capsid protein. Only a small fraction of the phage capsid protein is substituted with the in-frame fusion of the human cDNAs of the library. The majority of the capsid protein is produced by the host bacterium from an episomic T7-capsid gene. Therefore the majority of the each capsid protein is wild-type and can react with the anti-capsid antibody. An example of a Cy5 labeled anti-human IgG reacting with IgG in patients #1 serum bound to clones biopanned using patient #1 serum is shown in FIG. 6c.

The data analysis proceeds according to the following steps:

1. Pre-processing and normalization.

2. Identifying the most informative markers

3. Building a predictor for molecular diagnosis of ovarian cancer and validating the results.

The pre-processing and normalization step is used for arrays using two channels such as Cy5 for the human IgG and Cy3 for the T7 control. The spots are segmented and the mean intensity is calculated for each spot. A mean intensity value is calculated for the background, as well. A background corrected value is calculated by subtracting the background from the signal. If necessary, non-linear dye effects can be eliminated by performing an exponential normalization (Houts, 2000) and/or a piece-wise linear normalization of the data obtained in the first round. The exponential normalization can be done by calculating the log ratio of all spots (excluding control spots or spots flagged for bad quality) and fitting an exponential decay to the log (Cy3/Cy5) vs. log (Cy5) curve. The curve fitted is of the form:

y=a+b*exp(−cx)

where a, b and c are the parameters to be calculated during curve fitting. Once the curve is fitted, the values are normalized by subtracting the fitted log ratio from the observed log ratio.

This normalization has been shown to obtain good results for cDNA microarrays but it relies on the hypothesis that the dye effect can be described by an exponential curve. The piece-wise linear normalization can be done by dividing the range of measured expression values into small intervals, calculating a curve of average expression values for each such interval and correcting that curve using piece-wise linear functions.

The values coming from each channel are subsequently divided by the mean of the intensities over the whole array. Subsequently, the ratio between the IgG and the T7 channels was calculated. The values coming from replicate spots (spots printed in quadruplicates) are combined by calculating mean and standard deviation. Outliers (outside +/−two standard deviations) are flagged for manual inspection). Single channel arrays are pre-processed in a similar way but without taking the ratios. This preprocessing sequence was shown to provide good results for all preliminary data analyzed.

A first test (Procedure 1 disclosed above) is necessary to determine whether a specific epitope is suitable for inclusion in the final set to be spotted.

Procedure 2 is used to maximize the information content of the set of epitopes while trying to minimize the number of epitopes used using the following procedure.

The arrays used in this example, (using two channels such as Cy5 for the human IgG and Cy3 for the T7 control) are processed as follows. The spots are segmented and the mean intensity is calculated for each spot. A mean intensity value is calculated for the background, as well. A background corrected value is calculated by subtracting the background from the signal. The values coming from each channel are normalized by dividing by their mean. Subsequently, the ratio between the IgG and the T7 channels are calculated and a logarithmic function is applied. The values coming from replicate spots (spots printed in quadruplicates) are combined by calculating mean and standard deviation. Outliers (outside +/−two standard deviations) are flagged for manual inspection.

The histogram of the average log ratio is calculated. If the histogram is unimodal (e.g subject 19218 in FIG. 13), there is no specific response. If the histogram is clearly bimodal (e.g. subject 19223 in FIG. 13), there is a specific response. All 25 subjects analyzed so far fell in one of these two categories or had no response at all. The preliminary data analyzed so far showed a very good separation of the distributions for the patients.

Building the Predictor

A number of machine learning and statistical techniques have been considered for this task. The following algorithms were tested: CN2 (Clark, 1989), C4.5 (Quinlan, 1993; Breiman et al., 1984), CLEF 1998), 4.5 using classification rules (Quinlan, 1993), incremental decision tree induction (ITI) (Utgoff, 1989; quantization (LVQ) (Kohonen, 1988; Kohonen, 1995), induction of oblique trees (OC1) (Health and Salzberg, 1993; Murthy, 1993), Nevada backpropagation (NEVP); Rumelhart et al., 1987), Constraint Based Decomposition (Draghici, 2001), k-nearest neighbors with k=5 (K5), Q* and RBF's (Musavi et al., 1992; Poggio and Girosi, 1990).

The generalization abilities and the reliability of these techniques have been tested extensively on various problems and data sets from the UCI machine learning repository (Blake et al., 1998). This repository contains a large collection of mostly real world data from a large variety of domains (including biological and medical), and constitutes a benchmark on which various algorithms and techniques can be tested.

Table 2 presents the accuracies obtained by these techniques on the selected problems. Table 3 presents the standard deviation of each such algorithm on the same problems. Based on these tests applicant decided to start the tests by using constraint based decomposition (CBD), radial basis functions (RBFs) and decision trees (C4.5) as the three main candidates. The CBD was selected because it offers a high reliability across multiple trials (lowest standard deviation) and a good accuracy (second best). Furthermore, the CBD algorithm can also produce a logical expression describing the classifier produced. Such expressions allow one to understand the relative importance of various epitopes. The decision trees have been selected mainly because they can be mapped into logical expressions that can be compared to the one produced by the CBD. RBFs construct clusters by placing high dimensionality Gaussian functions on groups of given data points (one data point can be a set of expression values corresponding to a protein chip). This technique calculates automatically the number of clusters, their orientation (the eigenvectors of the correlation matrix of the expression vectors) and their widths. RBFs were expected to perform much better than k-means clustering and the other techniques already used in this context because RBFs avoid guessing (e.g. k in k-means clustering). Furthermore, extracting a model from the trained RBF architecture is straightforward. Again, this model can be compared with the models provided by the CBD and C4.5

DATASET
C4.5
C4.5r
ITI
LMDT
CN2
LVQ
OC1
NEVP
K5
Q*
RBF
CBD

GLASS
70.23
67.96
67.49
60.59
70.23
60.69
57.72
44.08
69.09
74.78
69.54
68.37

IONOSPHERE
91.56
91.82
93.65
86.89
90.98
88.58
88.29
83.8
85.91
89.7
87.6
88.17

LUNG
40.17
39.84
38.47
55.49
37.17
55.71
54.28
33.12
68.54
60
65.7
60

CANCER

WINE
91.09
91.9
91.09
95.4
91.09
68.9
87.31
95.41
69.49
74.35
67.87
94.44

PIMA
71.02
71.55
73.16
73.51
72.19
71.28
50
68.52
71.37
68.5
70.57
68.72

INDIANS

BUPA
65.14
65.39
63
71.54
64.31
64.13
65.57
77.72
66.43
61.43
59.85
62.32

TICTAC
83.52
99.17
92.89
89.61
98.18
65.61
78.56
96.91
84.32
65.7
72.19
75.1

TOE

BALANCE
64.61
75.01
76.76
93.27
80.89
89.54
92.5
91.04
83.96
69.21
89.06
90.08

IRIS
91.6
91.58
91.25
95.45
91.92
92.55
93.89
90.34
91.94
92.1
85.64
96

ZOO
90.27
90
90.93
96.61
91.91
91.42
66.68
92.86
67.64
74.94
X
94.29

AVG
75.92
78.42
77.87
81.84
78.89
74.84
73.48
77.38
75.87
73.07
74.22
79.75

Table 2 shows a comparison of several classification techniques. The table presents the accuracies obtained in various problems from the UCI machine learning respiratory. Each accuracy is the average of 10 trials.

DATA-SET
C4.5
C4.5r
ITI
LMDT
CN2
LVQ
OC1
NEVP
K5
Q*
RBF
CBD

GLASS
7.23
6.28
7.96
11.25
8.34
10.24
9.1
6.29
7.81
6.98
7.35
2.08

IONOSPHERE
2.82
2.58
2.71
3.51
3.29
3.36
2.21
3.81
4.14
4.7
6.45
2.56

LUNG
14.2
18.92
13.52
32.2
13.79
12.48
17.53
14.83
11.96
18.6
16.27
12.6

CANCER

WINE
5.84
5.09
6.24
5.22
6.11
4.84
8.45
2.22
6.86
6.64
5.16
1.96

PIMA
2.1
3.92
2.16
4.3
2.36
4.46
22.4
3.19
3.67
8.19
2.39
3.02

INDIANS

BUPA
5.74
6.05
4.23
6.63
7.99
7.14
8.45
11.97
7.22
4.25
7.92
2.05

TICTAC
2.44
1.05
2.38
8.79
0.95
2.99
5.88
1.32
2.7
3.16
3.35
9.43

TOE

BALANCE
3.35
3.98
3
2.95
3.38
4.39
2.07
7.12
7.53
19.09
2.38
3.03

IRIS
5.09
5.09
4.81
4.71
5.95
3.73
4.68
7.45
4.1
5.28
27.37
4.35

ZOO
7.59
7.24
6.11
1.56
5.95
6.26
30.36
4.62
20.03
23.8
X
2.13

AVG
5.64
6.02
5.312
8.112
5.811
5.989
11.11
6.282
7.602
10.07
8.738
4.321

Table 3 shows a comparison of several classification techniques. The table presents the standard deviations obtained in a set of 10 trials on various problems from the UCI machine learning repository.

Furthermore, one can also implement and try the predictors used in (Golub et al., 1999) and (Alizadeh et al., 2000) which were shown to work well in cancer diagnosis problems similar to applicant's. The selection of the final predictor was based on the validation results obtained in the last step of the data analysis.

Validating the Predictor

In order to validate the predictors, the classical method of cross-validation was used (Breiman et al., 1984). The idea behind cross-validation is that the predictor is tested, not based on its abilities to simply memorize the data presented during the training, but based on its abilities to generalize the knowledge acquired during the training to previously unseen cases. For this reason, the predictor must be checked on data that belongs to the same distribution but was not used during the training. This can be implemented in several ways depending on the number of examples available. If only few examples (such as stage I patients, ˜40 total) are available, reducing the size of the training set even further by setting patterns aside for generalization testing could jeopardize the training. In such cases, the algorithm is used with only n−1 of the n available patterns and tested on the remaining one. This is done n times, each time leaving out a different pattern. An average is calculated over the n experiments. This is known as the leave-one-out method. If more patterns are available, the pattern set can be divided into n different subsets of patterns. Then one subset can be left out of the training and used to test the generalization. Again, the value reported is an average of the n trials performed leaving out each of the n subsets. This method is known as n-fold cross validation. Finally, if the pattern set is very large (patients with stage III or IV cancer), it can simply be divided into a training set and a validation set. In this case, the generalization abilities of the technique can be characterized by its performance on the validation set.

For each predictor the specificity, sensitivity, positive predictive value and negative predicted value can be calculated using cross-validation data (i.e. values that have not been used in constructing the predictor itself). This ensures that the quality measures obtained in this study reflect the real world performance to be expected in the field.

Once informative phagotopes are found the gene encoding the phagotope was identified.

1. Identification genes encoding the phagotopes. Phage clones specifically reacting with patient sera, as determined by microarray immunoscreening, can be amplified by PCR using T7 capsid forward and reverse primers. PCR fragments were purified and 100 ng of fragment was analyzed to determine the nucleotide sequence of the cDNA insets. Sequence alignments are performed using BLAST software and GenBank databases. The sequence information can be used in several ways. Initially, the DNA sequence information provides a database of the frequency of reactivity to a particular epitope.

Diagnostic Markers Derived from the Combined Processes Including Biopanning, Assay of Patients' Sera with Epitopes on Filters and Biochips, and Identifying the Best Predictor/marker of Disease.

DNA Sequence Analysis of Phagotope Clones

PCR amplified DNA sequences from 96 phagotopes that reacted with patient #1 and at least one other OVCA serum are shown in the table below. Some clones were isolated multiple times and one clone was represented 23 times out of the 96 clones analyzed. This was the human homologue of the oncogenic gene Bmi-1, (GenBank NM005180.1) that inhibits the expression of p14ARF and cooperates with c-myc (Lindstrom et al., 2001. The insert sizes for the Bmi-1 phage clones varied in coding capacity depending on the isolate between 67-94 amino acids in length. Eight other clones were represented twice and one was isolated three times. One of these genes isolated twice was the heat shock protein 70, which has been shown to be overexpressed and antigenic in ovarian cancer tumors and was found to have been identified in the SEREX database 5 times. The size of the open reading frame in the HSP70 clone is 109 amino acids in length. Another clone isolated two times of the 96 sequenced is a known cancer antigen called RCAS1 which is overexpressed in 58% of ovarian cancer and many others as well (Sonoda et al., 1996) RCAS1 is an estrogen regulated gene which can inhibit the immune system from killing a tumor (Nakashima et al., 1999). This information clearly indicates that this technology is capable of detecting cancer antigens that can be used for diagnostic and immunotherapy purposes. If overbiopanning occurred, only a few different clones would be found. However, as the remaining clones were isolated once each, it is therefore convincing that 4-5 biopannings is appropriate. In this first group of 480 clones there were isolated clones that reacted with approximately 60% of the OVCA patients using the macroarray filters and more efficiently using the microarray technology. Additional epitope clones provide additional sensitivity for this assay.

Clone Name
GenBank ID

Clone found 23 times Bmi-1 (oncogene)
NM_005180.1

Clones found 2-3 times

HSP-70
XM_050984.1

RCAS1 (EBAG9)
BC005249.1

A-kinase anchoring protein 220
XM_038666.1

G-protein gamma-12 subunit
NM_018841.1

Neuronal apoptosis inhibitory protein 6
AF242431.1

hypothetical protein DC42
XM_028240.1

WD repeat domain 1 (WDR1)
XM_034454.1

zinc finger protein 313
XM_009507.1

54 other clones isolated once each.

Serum reactivity toward a cellular protein can occur for two possible reasons: 1) expression of a mutated form of the protein by the tumor cells and 2) overexpression of the protein in the tumor cells. Identification of proteins detected by the host immune system in this fashion therefore provides patienthanistic information about protein(s) that can be mutated or overexpressed in ovarian cancer. Such information provides insight into the molecular targets and mechanisms giving rise to ovarian cancer. Lastly, the sequences identified using the epitope-biopanning/phage microarray approach can be useful for early detection of cancer occurrence and recurrence by screening patients' sera and peritoneal fluids and providing immunogens for immunotherapy vaccines.

Example 2

A strategy was developed for serological detection of large numbers of antigens indicative of the presence of cancer, thereby using the humoral immune system as a biosensor. The high-throughput selection strategy involved biopanning of an ovarian cancer phage display library using serum immunoglobulins from an ovarian cancer patient as bait. Protein macroarrays containing 480 of these selected antigen clones revealed 44 clones that interacted with immunoglobulins in sera from all (32/32) ovarian cancer patients, but not with sera from either healthy women (0/25) or patients having other benign or malignant gynecological diseases (0/14). An informative subset of 26 antigen clones was chosen based on the criterion that the serum from each of a group of 16 patients interacted with at least one of the clones. When another, independent group of 16 serum samples was used, all 16 samples interacted with one or more of the 26 clones, and none from 12 healthy women. The process of globally profiling disease relevant epitopes is known as “epitomics”.

In searching for a method for the early detection of ovarian cancer (OVCA), large numbers of potential diagnostic antibodies were identified and a high-throughput strategy was developed to clone antigen biomarkers. Because antibodies to any single antigen tend to detect only a small fraction of cancer patients, the necessity to screen a large panel of potential antigen markers was recognized. Therefore a differential biopanning technique was used to screen T7 phage display cDNA libraries to isolate cDNAs coding for epitopes binding with antibodies present specifically in the sera of patients with early or late stage ovarian cancer but not with antibodies in the sera of healthy women. Using a single OVCA patient's immunoglobulins (IgG) as bait, there were identified both established and novel antigen biomarkers. Large numbers of cancer-associated antigens can be found by this phage display technique more rapidly than using standard SEREX analysis. This is due to the power of repeated cycles of selective enrichment possible with viable phage display cDNA biopanning, especially when screening is performed with serum containing a complex mixture of low titer of IgGs, compared to the single step screening possible with SEREX, which is biased toward the identification of antigens that can be detected at a relatively high titer of IgGs.

The antigens that were identified through this process have diagnostic value with additional potential for development of therapeutic vaccines or imaging reagents. Since the host immune system can unravel molecular events (overexpression or mutation) critical to the genesis of ovarian cancer, this novel proteomics technology can identify genes with significant mechanistic involvement in the etiology of the disease. Our initial goal is to develop a serum-based test that can detect ovarian epithelial cancer at an early and curable stage.

Methods

Serum Samples. Blood samples from ovarian cancer patients (Stages I-IV) and healthy controls were obtained from the Barbara Ann Karmanos Cancer Institute. Processing of blood to extract serum was performed in the laboratory. Briefly, blood samples were centrifuged at 2500 rpm at 4° C. for 10-15 minutes and supernatant were stored at −70° C. until use.

Construction of T7 phage display cDNA library from ovarian cancer cell line, SKOV3. Isolation of mRNA from total RNA. Ovarian cancer cells were grown in monolayer culture. Total RNA was prepared using trizol reagent according to manufacturer's instructions (Invitrogen, Carlsbad, Calif., USA). Total RNA, 0.5 mg, was used for the purification of Poly(A)+ mRNA following the method as suggested by the manufacturer (QIAGEN Inc, Valencia, Calif.). Poly(A)⁺ mRNA was quantitated by UV spectroscopy and the process of poly A selection was repeated once. Twice poly (A) selected mRNA was stored at −70° C. for use in library preparation.

Construction of T7 phage display cDNA library. Novagen's OrientExpress® cDNA Synthesis and Cloning systems were used in the construction of the ovarian cancer T7 phage cDNA libraries (Novagen, cDNA manual, TB247). The OrientExpress Random Primer System was used to achieve orientation-specific cloning between EcoRI and HindIII sites. First and second strand cDNA synthesis were sequentially carried out in the presence of 5-methyl dCTP. After second strand synthesis, the cDNA was treated with T4 DNA polymerase to blunt the ends. The addition of EcoRI/HindIII Directional Linker d(GCTTGAATTCAAGC) (SEQ ID NO: 3) at the d(A)n:d(T)n end created a HindIII site d(AAGCTT) (SEQ ID NO: 4) in which the two underlined bases were derived from cDNA. The two dT's were provided on the 5′ end of each first strand by the HindIII random primer d(TTNNNNNN) (SEQ ID NO: 5). Excess linkers and small cDNAs (<300 bp) were removed by a gel filtration step as described in Novagen's manual TB 247. The digestion of the cDNA with both HindIII and EcoRI thus yielded cDNA molecules ready for directional insertion into EcoRI/HindIII vector T7Select 10-3 arms. After vector ligation and packaging using T7 packaging extracts, the phage were plated to determine the library titer. About 50 phage clones were randomly picked up and PCR was performed with the T7 forward primer (TCTTCGCCCAGAAGCAG) (SEQ ID NO: 6) and T7 reverse primer (CCTCCTTTCAGCAAAAAACCCC) (SEQ ID NO: 7), in order to determine the insert sizes. The insert size range was found to be between 300 bp-1.5 kb.

Amplification of packaged libraries by liquid culture method. 10 ml of LB/carbenicillin medium was inoculated with a single colony of E. coli strain BLT5615 from a freshly streaked plate. The mixture was shaken at 37° C. overnight. Five ml of the overnight culture was added to 90 ml of LB/carbenicillin medium and was allowed to grow until the OD₆₀₀reached 0.4-0.5. After obtaining the appropriate OD, 1 mM Isopropyl-β-D-thiogalacto-pyranoside (IPTG), (1×) M-9 Minimal salts and 0.4% glucose were added and the cells were allowed to grow for 20 minutes. An appropriate volume of culture was infected with phage library at multiplicity of infection (MOI) of 0.001-0.01 (100-1000 cells for each pfu). The infected bacterial culture was incubated with shaking at 37° C. for 1-2 hours until lysis was observed. After lysis, 0.02% glycerol and 0.02M phenyl-methyl sulphonyl fluoride (PMSF) and protease inhibitor cocktail (PIC) were added to the cell lysate to block proteolysis of the capsid fusion proteins. The phage lysate was centrifuged at 8000×g for 10 minutes. The supernatant was collected and stored at 4° C. The lysate was titered by plaque assay under standard conditions. The libraries were stored at −80° C. after purification by polyethylene-glycol precipitation and ultracentrifugation through a cesium chloride step gradient.

Selection of T7 Phage Displayed cDNA Libraries with Human Sera.

Affinity selection with sera from normal individuals. Twenty-five μl of Protein G Plus-agarose beads were placed into a 0.6 ml microcentrifuge tube and washed twice with 1× phosphate buffered saline (PBS). The washed beads were blocked with 1% bovine serum albumin (BSA) at 4° C. for 1 hour and then incubated at 4° C. for 1 hour with 250 μl of pooled sera from 20 healthy women at a 1:20 dilution. After 3 hours of incubation, beads were washed three times with 1×PBS and then incubated with phage library (˜10¹⁰phage particles) made from an ovarian cancer cell line, SKOV3. The mixture was centrifuged at 3000 rpm for 2 minutes to remove phage nonspecifically bound to the beads and the supernatant (phage library) was collected for immunoselection.

Immunoselection of the phage mixture with serum from an ovarian cancer patient. Protein G Plus agarose beads were placed into a 0.6 ml microcentrifuge tube and washed two times with 1×PBS. The washed beads were blocked with 1% BSA at 4° C. for 1 hour and then incubated at 4° C. with 250 μl of a 1:20 dilution of serum from the ovarian cancer patient, MEC1. After 3 hours, the beads were washed three times with 1×PBS and incubated for immunoselection overnight at 4° C. with the phage library supernatant. After this incubation, the mixture was centrifuged at 3000 rpm for 2 minutes and the supernatant was discarded. The beads were washed three times with 1×PBS and the phage was eluted from the washed beads as per the manufacturers instructions. The bound phage was removed from the beads by centrifugation at 8000 rpm for 8 minutes. Eluted phage (200 μl) were transferred to liquid culture for amplification (100 μl elution to 20 ml culture). Four rounds of affinity selection were carried out on the amplified phage obtained for each series of biopannings. The number of biopanning cycles generally determines the extent of the enrichment for phage that binds to the sera of patient with ovarian cancer. Four other serum samples from ovarian cancer patients were also used for immunoselection of clones. MEC1 gave the strongest binding with its clones and therefore those clones were selected for the remainder of this study.

Macroarray immunoscreening. The titer of the T7 phage library obtained after amplification of each Biopanning (BP1-BP4) eluate was determined by plaque assay. E. coli BLT5615 was infected with the primary unamplified phage from biopanning (BP1-BP4) and plagued to limiting dilution onto LB/carbenicillin plates (150 mm×15 mm petri dish) so that sufficient numbers of single plaques could be isolated to obtain 12×96 well plates for arraying. The plates were incubated at 37° C. for 3-4 hours until the plaques were visible and then picked for amplification in the 96-well plates. Lysis of the host bacteria generally occurred after 2 hours. After bacterial lysis, the plates were centrifuged at 3000 rpm for 20 minutes. The samples from the 96-well plates were arrayed onto a nitrocellulose membrane (Osmonics) using the Beckman Biomek 2000 liquid handling robot. This robot, equipped with a 96-pin printing head spotted the samples contained in 96 well plates onto nitrocellulose membranes. The patterns were printed in a 4×4 configuration. Position A1 contained 16 spots, each representing a phage sample (FIG. 12A). Triplicates were printed from well A1 of each of five different 96 well plates (15 spots) and the 16^thspot contained a positive control of diluted human serum used in the 4 corners of the plate only as shown by black arrows (FIG. 12A). After each round of spotting, the pins were washed in 0.1% SDS, sterile water, and then ethanol. After the spotting was completed, nitrocellulose membranes were blocked with 5% non-fat dry milk for 1 hour at room temperature. The membranes were then incubated with a patient's serum (pretreated with 150 μg of bacterial extracts for 2 hours at 4° C.) at a dilution of 1:10000 or 1:3000 for 1 hour at room temperature. Bacterial extract was used because some patients and controls had antibody binding to bacterial protein(s). The membranes were then washed three times with 0.24% Tris, 0.8% NaCl, 1% Tween-20 (TBST) for 15 minutes each and then incubated with secondary antibody, goat-anti human IgG-HRP conjugated (Pierce, Rockford, II, USA) at 1:5000 dilution for 1 hour at room temperature. The membranes were again washed three times with TBST for 15 minutes each, developed with Supersignal West Pico chemiluminescent substrate (Pierce, Rockford, II, USA) and the images captured on X-ray film.

Stability of Serum Specimens. One source of error in the immunodetection on macroarrays could be variability in serum sample preparations or storage. Therefore, a test was performed to determine whether some common handling conditions adversely affect the usefulness of the sera for the assays. For this test, several aliquots of the same serum sample from one ovarian cancer patient were subjected to various treatments; repeated freeze-thaw cycles (10 times), incubation of the blood sample at 37° C. for 72 hours before processing the serum, extended storage at 4° C., treatment at room temperature overnight, and heat treatment at 65° C. for 10 minutes. Freshly thawed serum, processed normally, served as a control. Robotically printed nitrocellulose membranes containing the set of 480 clones were later processed with each of those treated and untreated serum samples.

ELISA Macroarray analysis. Forty-four Stage I-IV clones, in triplicate, were arrayed onto a nitrocellulose membrane (Osmonics) using the Beckman Biomek 2000 liquid handling robot. Nitrocellulose membranes were blocked with 5% non-fat dry milk for 1 hour at room temperature and then incubated with patient or control serum (pretreated with 150 μg of bacterial extract for 2 hours at 4° C.) at dilutions of 1:1000, 1:3000, 1:10000 and 1:30000 for 1 hour at room temperature. Immunoreactivity was performed with serum from patients or healthy controls. For one set, the immunoreactivity was also performed with a monoclonal antibody to the N-terminus of the T7 gene 10 protein at dilution 1:10000. This was performed as described in the macroarray immunoscreening. The intensity of each spot was measured using ImaGene software from BioDiscovery Inc, with background subtraction and calculated using the following equation:

Intensity Ratio=(Mean of Clone)/(Mean of T7 for 12 replicates of that Clone)−(Mean of Blank Phage)/(Mean of T7 for 12 replicates of that Blank Phage). The Intensity Ratio vs Serum concentration was plotted for each antigen clone.

Sequencing of phage cDNA clones. Individual phage clones were PCR amplified using forward PCR primer 5′ GTTCTATCCGCAACGTTATGG 3′ (SEQ ID NO: 8) and reverse PCR primer 5′ GGAGGAAAGTCGTTTTTTGGGG 3′ (SEQ ID NO: 9). PCR products were purified on 1% agarose gels. The bands were excised from gels under UV light and DNA was extracted/purified using a Qiagen gel extraction kit (Qiagen Inc, Valencia, Calif., USA). Fifty ng of each purified PCR product was analyzed using forward Sequencing primer 5′ TGCTAAGGACAACGTTATCG 3′ (SEQ ID NO: 10) by Wayne State University DNA Sequencing Core Facility.

Results

Differential Biopanning of T7 Phage cDNA Expression Libraries Employing Sera Obtained from Women with Ovarian Cancer and Healthy Controls

A method of differential biopanning to screen a T7 phage cDNA library prepared from an ovarian cancer cell line, SKOV3, was developed using a late stage ovarian cancer patient's serum (MEC1) as the bait to isolate tumor-specific antigens. First the library was pre-adsorbed with sera pooled from 20 healthy controls so as to remove the antigen clones binding with common antibodies unrelated to cancer. The resulting phage were then bound to antibodies present in the serum of a cancer patient and the unbound phage removed. This selection procedure was repeated four times, amplifying the phage between cycles of biopanning. Groups of 96 clones were picked from the patient's biopanning at cycles 1, 2, 3 and 4. Amplified phage clones were spotted on nitrocellulose membranes, and useful phage clones were identified by their binding with patient IgG antibodies at a dilution of 1:10000. There was a significant enrichment for phage-bearing epitopes that bound serum IgGs after the fourth round of biopanning. Because about 35% of the selected phage clones interacted with MEC1 serum IgGs after the fourth round of biopanning, further biopanning was not performed to avoid reducing the diversity of phage clones.

Serological Detection of Antigens Using Macroarrays

The utility of such phage display antigen clone sets for the serological detection of cancer is best demonstrated by their interaction with sera from patients other than those used in the selection step. A set of 480 clones from the fourth round of biopanning was robotically spotted on nitrocellulose membranes. The binding of the cloned antigens with the IgGs in patients' sera was analyzed at a dilution of 1:10000. The strong positive interactions observed with the MEC1 serum indicated a relatively high titer of the IgG molecules that bound with the MEC1 clones (FIG. 12A). Several dilutions of the MEC1 serum were previously used for antigen detection and a dilution of 1:10000 produced the cleanest pattern of strong binding. Although 480 clones were identified from the biopanning with MEC1 serum as the bait, not all 480 clones interacted with the MEC1 serum (FIG. 12A). This can be explained by a non-specific interaction between phage clones and the Protein-G+ beads bearing the serum antibodies. When serum IgG-binding with sera from other patients (non-self reaction) was analyzed using replicates of these robotically spotted macroarrays, cross-reactivity was observed in most patients at a dilution of 1:10000 (FIG. 12B-E). Sera from other patients required either a 1:3000 or 1:30000 dilution to detect positive clones. Binding was scored positive only when 3 of the triplicates had similar intensity and when the intensity was significantly higher than the background intensity of other spots within the same patch. Sera from 71 individuals were tested; 10 were from women with early stage OVCA (Stage I and Stage I borderline), 22 from women with late stage OVCA, 14 from women with benign or other gynecological diseases, and 25 from healthy controls. Tumor histology and stage of all the patients' used for the study are listed in Table 4. Late stage patients OVC015 and MEC23 bound more intensely than the Stage I patients 4679 and 4387 (FIG. 12B-E). In the subtractive biopanning scheme, phage epitope clones binding IgGs were isolated in control sera even though these control sera were not used in the initial subtractive biopanning steps. As expected, a fraction of the 480 phage clones on the macroarrays interacted with approximately 10% of the controls. All clones that interacted with the control sera were eliminated from further consideration. One hundred and forty-nine clones interacted with sera from Stage I-IV ovarian cancer patients but with none of the 25 control sera. Forty-four out of 149 clones interacted specifically with these Stage I-IV sera. The remaining 105 clones interacted with sera from women who had benign tumors, endometrial cancers or other gynecological diseases and may represent biomarkers of gynecological sickness. These clones were excluded because these conditions are a common source of false positive results in CA-125 clinical testing. A matrix summarizing the binding of the 44 Stage I-IV selected antigen clones to sera from patients and controls is shown in Table 5A. The derivation of this matrix was based on an agreement between two observers who analyzed the data independently, with 87% concordance.

Only 2/44 selected clones, 2G4 and 3B12, bound with MEC1 serum IgGs despite the fact that T7 cDNA library was biopanned with MEC1 serum as the bait. A large number of clones interacting with the MEC1 serum were eliminated because they bound with either healthy control or with patients having benign or other gynecological diseases. The best markers are those interacting with the most patients; these include such clones as 2H9 (13/32), 2G2 (13/32), 2B4 (12/32), and 2G4 (12/32) that had the highest frequency of IgG binding with sera from ovarian cancer patients. Three antigens, 2F7/2B4, 5C3/2G4, 2E1/4A3 were found in multiple clones resulting in a panel of 41 markers binding with IgGs in Stage I-IV ovarian cancer sera (Table 5A).

Although 41 antigens interacted with sera from all 32 patients, the number of clones in the set needed to detect all 32 ovarian cancer patients were reduced. The serum set from 32 patients was randomly divided into two groups. The first group (Group 1) consisted of 16 patients and 25 healthy women; and the second group (Group 2) consisted of the other 16 patients and 12 different healthy women. Group 1 was used to select the minimum number of clones necessary to detect all patients. The strategy of clone selection involved ranking of clones in order of decreasing binding with sera from ovarian cancer patients (Table 7A). Next, a combination of clones was selected for binding with IgGs in sera from all of the ovarian cancer patients in the set. Twenty-six clones of Group 1 detected all of the ovarian cancer patients (16/16) (Table 7A); all but one patient's serum bound with more than one of the selected clones. These 26 clones were then tested on sera from Group 2 (16 patients and 12 healthy controls), for antibody binding (Table 7B). Sera from all of the patients in Group 2, (16/16), bound with at least one of these clones and none of the sera from the healthy women (0/12) bound to these clones.

A second group of 21 clones was found to interact with (18/22) late Stage patients' sera but not with sera from early stage patients, with sera from 25 healthy women or with sera from 14 patients with either benign tumors, endometrial cancers or other gynecological diseases (Table 5B). Although 4 late stage patients were not detected by these 21 clones (Table 5B), they were detected by 44 Stage I-IV clones (Table 5A). Among these 21 clones, antigen 2B3 interacted with the greatest number of patients sera (10/22), clone 5A2 with 8/22, clones 2D7 and 2E7 with 5/22 sera. Although these clones did not detect women with early stage ovarian cancer, further analysis may show them to be useful as markers of recurrence.

Stability of Serum Specimen.

An important feature of a test for widespread clinical use is the stability of the analyte in the test sample. To identify any inaccuracy in detecting IgG molecules in this multianalyte assay due to serum sample preparation problems or serum storage, a test of the durability of the serum samples was carried out. Repeated freeze-thaw cycles (10 times), heated to 65° C. for 10 minutes, or left the unprocessed blood at 37° C. for 72 hours were performed. Only heat treatments of the serum affected the positive signals on the macroarrays, because heat treatment is sufficient to denature immunoglobulins (IgG). Therefore, the complex set of IgG molecules in serum samples are very stable and provide a reliable analyte for clinical studies of diagnostic arrays of cloned antigens.

ELISA Macroarray Analysis.

The set of 44 (Stage I-IV) phage display cDNA clones listed in Table 5A, were printed robotically on nitrocellulose membrane and an enzyme-linked immunosorbent assay (ELISA)-like experiment was performed. For clones 4A11, 2H9, 2G4 and 2F7, the binding of antigens decreased with increasing dilution of serum (FIG. 13A-D). Although clones bound nonspecifically with control sera at high serum concentrations, their binding decreased to zero as the sera were diluted, whereas the interaction of the same clones with IgGs in patients' sera persisted at even 1:10000 serum dilution. This demonstrated that the interaction of antigen clones with patients' sera was indicative of a typical, titerable antigen-antibody interaction.

Phage-coded Antigen Sequence Analysis

To identify the selected gene products, phage DNAs were amplified by PCR and the cDNA products sequenced. The DNA sequences were checked for homology to the GenBank databases using BLAST. The predicted amino acids in-frame with the T7 gene 10 capsid protein were determined. Eleven sequences were homologous to known gene products while other clones had no homology to any annotated sequences in the public databases (Table 6A). Among the gene products, 11 represented known gene products in the correct orientation and in the correct reading frame with the T7 gene 10 capsid protein indicating that the serum IgG binding region was localized to a portion of the natural open reading frame of the protein. Of the remaining 33 clones, 13 clones contained an open reading frame with the T7 10B gene with a frameshift within the natural reading frame of the gene; 7 clones contained portions of either 5′ or 3′ untranslated regions of known genes; 13 clones contained segments of genomic sequences. This in turn resulted in the formation of recombinant fusion proteins in which the predicted amino acid of the in-frame fusion with the T7 10B protein was not similar to the original protein coded by the gene. The size of the additional peptide sequences ranged from 5-48 amino acids. This result indicated that the recombinant gene products of these clones must be coding for proteins that mimic some other natural antigens, and hence can be termed mimotopes (Table 6A). BLASTp search of the SWISSPROT database for homology to each in-frame mimotope confirmed this observation. For example, clone 2H5 contained a nucleotide sequence homologous to the ATP synthase, H+ transporter. Using BLASTp, there was observed a sequence homology of (8/10) amino acids with the leukocyte common antigen precursor. Each mimotope had significant homology to a natural open reading frame (Table 6A).

Discussion

The early detection of cancer is a significant challenge in clinical oncology. Once accurate methods become available, early detection can result in a significant reduction in morbidity and mortality of these diseases. The detection of ovarian cancer at Stage I could result in a cure rate of 90%. To this end there has been devised an approach of high-throughput selection of antigen biomarkers using phage display libraries and marker selection using a highly parallel analysis on macroarrays. The process began with a representative sample of 480 cloned markers from biopanning an ovarian cancer T7 phage display cDNA library with one patient's serum. There was first demonstrated that these clones bound to IgG molecules found in the sera of patients other than the one used for antigen selection. One hundred and forty nine markers that bound to IgGs in sera from OVCA patients showed no interaction with sera from cancer-free women. Forty-one of these antigen biomarkers had positive interactions with early (including cancers with borderline histology) and late stage ovarian cancer patients and there were no false positive interactions with IgGs in sera from either women having benign gynecological syndromes such as ovarian cysts and endometrial fibroids or sera from women with endometrial cancer. Because Stage I and Stage I borderline tumors can elicit a detectable immune response in this assay, this technology is sensitive to very small tumor burdens as (Table 5A). Sera from women with other cancers can be used to distinguish markers that are specific to ovarian cancer from those that bind to antibodies in sera from individuals with other cancers. Based on this representative sample of 480 clones from a single selection experiment, discovery of these markers to larger numbers of epitope clones were scaled up, cloning from additional libraries using sera from these and other women with ovarian cancer. Although the epitope markers were cloned using serum from a patient having the most common histologic type of ovarian cancer, serous adenocarcinoma, there has been shown that these markers are capable of detecting other histologic types of ovarian cancer, including endometrioid and clear cell tumors as well (Table 5A, Table 4). When the top ranking 26 (Table 7A) were applied, to the dataset comprised of 16 patients and 12 healthy women, these clones bound to IgGs in the sera from 16 out of 16 patients (Table 7B). As none of these 26 clones showed binding to IgGs in sera from 25 healthy women in Group 1 or 12 healthy women in Group 2, it is likely they represent a promising discriminator between the healthy and cancer sera. Larger studies with additional antigen biomarkers in other populations can be used to verify that the rate of diagonistic misclassification with this approach is small enough to justify its use in a clinical setting as screening test for ovarian cancer.

Knowledge regarding the immunogenicity and expression pattern of serologically-defined tumor antigens is critical in assessing the therapeutic and diagnostic potential of those antigens. The present study demonstrates that the use of T7 phage display selected clones is an effective technique for molecular profiling of the humoral immune response in ovarian cancer. Within this initial panel of 41 biomarkers, 8/9 contained large portions of open reading frames of the parental proteins; 1F6 is the receptor-binding cancer antigen expressed on SiSo cells (Human uterine adenocarcinoma cell line) (RCAS1); 3A9 is the signal recognition protein (SRP-19); 5C11 is the AHNAK-related sequence; 2B4, nuclear autoantogenic sperm protein (NASP); 3C11 is the Ribosomal protein L4 (RPL4); 4H3 is the Nijmegen breakage syndrome 1 (nibrin) (NBS1); 2G4 is the eukaryotic initiation factor 5A (eIF-5A); and 5F8 is the Homo sapiens KIAA0419 gene product. With the exception of clone 4A11 that is the Homo sapiens chromodomain helicase DNA binding protein 1, CHD1, all of the aforementioned gene products have a known or suspected etiological association with cancer. One of these markers, RCAS1, is overexpressed in many cancers such as uterine, breast and pancreatic cancer. As indicated by the broad overexpression of RCAS1 in human cancers, some of the antigens identified may not be specific to ovarian cancer. However, this does demonstrate that the epitomics profiling of the humoral immune response in cancer patients can identify serum antibody markers that are relevant to the etiology of their cancer (e.g. overexpressed or mutated) with diagnostic and therapeutic value. Interestingly, these 9 antigens with parental open reading frames are predicted to be intracellular products. This finding is in agreement with reports using the SEREX procedure, whereby the majority of those antigens are also intracellular, and their probable release by necrosis or cell lysis at the tumor site is an initiating factor in eliciting an immune response.

The remaining 32 clones are mimotopes, defined as peptides capable of binding to the paratope of an antibody, but are unrelated in sequence to the natural protein that the antibody actually recognizes. Such peptides are usually identified by testing combinatorial peptide libraries obtained by chemical synthesis or phage display for their ability to bind monoclonal antibodies specific for discontinuous epitopes. This is analogous to the previous studies that have selected randomized peptide libraries on serum from Hepatitis B patients. Peptide mimotopes can potentially be used as a novel form of immunotherapy to induce a beneficial antitumor response. A mimotope derived from a phage display library can induce specific inhibition of the binding between tumor-inhibitory antibody and the Erb-2 receptor. Such mimotopes may represent a superior form of immunotherapy that may not elicit side effects due to autoimmunity to a natural protein.

In conclusion, using a combination of high throughput selection and array-based serological profiling that are called Epitomics®, there was isolated a panel of 41 antigens, including 8 antigens previously associated with cancer. Further work with larger panels of antigens analyzed on macroarrays or microarrays provide a comprehensive set of markers that can be evaluated using sera from other cancers for the specificity of an ovarian cancer test. This epitomics approach to antigenic profiling has applications to cancer, autoimmune diseases, and infectious diseases for diagnostic, therapeutic, and epidemiologic studies.

Example 3

The 480 clones described in Example 2 were screened against new independent samples of ovarian cancer patient and control sera, using the methods of Example 2. This procedure revealed 166 new clones of interest that discriminated cancer from non-cancer with 93% accuracy. Upon DNA sequencing it was found that there were 77 additional new antigens cloned. These antigens, listed in Table 6A, are epitopes including SEQ ID NOs: 90, 106, 135, 136, 145, and 150, and mimotopes including SEQ ID NOs: 76-89, 91-105, 107-144, 146-149, 151, and 152.

Example 4

Biopanning to Isolate Additional Antigens Using 4 Libraries Using 8 Different OVCA Sera:

Three additional T7 Phage Display OVCA cDNA libraries were prepared according to methods described in Examples 1 and 2. These three libraries, plus the library of Example 2, were biopanned against eight different patient sera. The properties of the sera are as follows:

Number of Clones

Chosen for set of 2800

Patients' Sera
Stage
Histology
Antigens

OVC063
III
Malignant Serous
384

OVC065
IC
Malignant Serous
384

OVC087
1A
Malignant
384

Endometroid Clear Cell

OVC0156
1A
Malignant Serous
384

OVC023
IIIC
Malignant Serous
96

Mec1
III
Malignant Serous
384+

(480 from Example 2)

OVC0155
I
Malignant Mucinous
384

OVC0111
IV
Malignant Mucinous
384

Positive clones from biopanning cycle 4 were selected on the basis of having strong reactions with sera from 30 patients and no reaction with more than 30 healthy controls. The best candidate markers were chosen on the basis of exhibiting a strong IgG binding signal in the self-binding chip and at least two other patients' sera.

Clone Subselection from 2800 to 1010 Antigens Using 30 Patients and 30 Controls:

The number of clones was reduced such that they could be spotted on a single microarray for the large validation sets. The methods used include:

1) Bootstrapping method combined with an ROC analysis.
2) A parametric test (moderated T-test)
3) Non-parametric test (U-test: analysis on ranks; less sensitive to outliers) The union of the top 600 clones from each of the 3 methods above yielded 776 clones indicating that among the 2800 antigens many were found to be good markers by all methods. From these 776 markers 432 were highly ranked consistently by all 3 methods. A number of negative controls were also chosen.

Validation Serum Sets:

A set of protein microarrays was used to validate the above selected markers, and also included were the 63 antigens from Example 2 and 81 antigens from example 3. In a set of 1000 microarray experiments, 337 clones were obtained that were significantly different between healthy and OVCA by t-test at the level p<0.01 after correction for multiple experiments. Using this large series, an accuracy of 90% was obtained using neural networks using 66% of the sera samples in training set and 34% in the test set. From this process, 34 new antigens clones were identified as markers. These antigens, listed in Table 6A, are epitopes including SEQ ID NOs: 159, 170, and 182, and mimotopes including SEQ ID NOs: 153-158, 160-169, 171-181, and 183-186.

Example 5

Discovery of Candidate Autoantigen Biomarkers from Proteins Commonly Overexpressed in Ovarian Cancer Via Literature Mining

We have found that at least some of the novel OVCA-induced autoantigens are overexpressed, as determined by immunohistochemistry, in tumor versus normal and benign ovarian tissues (Ali-Fehmi et al, 2010). Therefore, a rational approach to augmenting the panel of biomarkers for the detection and staging of ovarian cancer is to identify potential additional biomarkers through a literature search for proteins overexpressed in ovarian cancer tissue, as determined by immunohistochemistry.

A search was conducted as follows. The search was initiated with a list of potential genes involved in any cancer. A list was generated by plasma proteome (www.plasmaproteome.org/ppihome.htm). This list contained a total of 1261 genes. We searched literature using search criteria “gene name and immunohistochemistry and OVCA”. Serous adenocarcinoma histotype was preferentially targeted for our list, though in the majority of the articles, immunohistochemical data were not stratified based on the histotypes of OVCA. Initially, potential markers were selected solely based on the information presented in the abstract.

After generating the first list, relevant articles were read and a second cut was made based on expression level and expression in benign/normal tissue. Measures were taken to avoid proteins that were expressed in either benign or normal tissue. Exceptions were made for proteins expressed in normal tissue but showed significantly higher expression in cancer tissue. Also, attempts were made to avoid proteins expressed in borderline tumors. Secreted proteins were avoided because proteins shed from cancer cells into circulation can serve as blocking agents against autoantibodies. Cytokines were eliminated since cytokine levels can also be elevated due to inflammatory conditions and therefore mask our purpose of early detection. An added criterion was the commercial availability of the overexpressed proteins. Commercial availability facilitates testing of potential markers in high throughput antigen microarrays and other immunoassay technologies.

A total of 2522 abstracts and approximately 2000 articles were analyzed. The SEREX database was searched for evidence that the potential antigens elicit autoantibody reactions. The information obtained from the literature was then archived in a database. Two additional markers for OVCA from the list of Pathwork (Monzon et al., 2009).

The result was a table of 30 markers for OVCA that can be tested as potential autoantigens (Table 8), using samples of protein or peptide in the same manner as display phage in Examples 1-4 above.

Throughout this application, various publications, including United States patents, are referenced by author and year and patents by number. Full citations for the publications are listed below. The disclosures of these publications and patents in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this invention pertains.

The invention has been described in an illustrative manner, and it is to be understood that the terminology that has been used is intended to be in the nature of words of description rather than of limitation.

Obviously, many modifications and variations of the present invention are possible in light of the above teachings. It is, therefore, to be understood that within the scope of the appended claims, the invention can be practiced otherwise than as specifically described.

TABLE 4

Tumor Histology and Stage of Patients' sera

used for screening of ovarian cancer

Blood Specimen

ID #
Histology
Stage

MEC1*
serous adenocarcinoma
Unknown

MEC2
serous adenocarcinoma
IIA

MEC16
serous adenocarcinoma
IV

MEC20
serous adenocarcinoma
Unknown

MEC23
serous adenocarcinoma
IIIC

MEC35
serous adenocarcinoma
IIIC

MEC37
serous adenocarcinoma
IIIC

TB01-060
serous adenocarcinoma
IIIC

TB01-108
serous adenocarcinoma
IIIC

42501
adenocarcinoma NOS
late

400162
adenocarcinoma NOS
late

40036
adenocarcinoma NOS
late

42780
adenocarcinoma NOS
late

B755
adenocarcinoma NOS
late

40015
adenocarcinoma NOS
late

OVC075
serous adenocarcinoma
IIC

OVC015
serous adenocarcinoma
IIIC

OVC035
serous adenocarcinoma
IIIC

OVC007
mixed epithelial
IIIC

OVC005
Malignant Mized Mesodermal Tumor
IIIC

OVC063
serous adenocarcinoma
III

OVC045
serous adenocarcinoma
IIIC

NW0629 (4387)
endometrioid adenocarcinoma
IC

NW0453 (4679)
adenocarcinoma NOS
IC

NW0046 (4555)
borderline serous cystadenofibroma
IA

NW1181 (4283)
endometrioid adenocarcinoma
IA

OVC019
mixed epithelial
IC

OVC087
clear cell
IA

OVC078
endometriod
IC

OVC070
borderline serous
IC

OVC049
mixed epithelial
IA

OVC079
borderline serous
I

33-38
benign ovarian cyst
N/A

92-96
uterine myoma
N/A

80-82
endometrial adenocarcinoma
IIIA

79-62
endometrial adenocarcinoma
IIIA

35-27
benign ovarian cyst
N/A

30-141
benign ovarian cyst
N/A

70-153
endometrial adenocarcinoma
IB

81-80
endometrial adenocarcinoma
IA

31-55
benign ovarian cyst
N/A

39-55
benign ovarian cyst
N/A

36-11
endometrial polyp
N/A

32-43
Benign, thickening of endometrium
N/A

OVC068-1B
papillary serous adenoma (benign) &

endometriosis
N/A

OVC054
benign serous cystadenoma
N/A

*Serum used for biopanning

TABLE 5A

Binding of 44 Clones with Late Stage and Stage I Ovarian Cancer Patient Sera

The binding of a panel pf 44 clones with 22 Late Stage, 10 Stage I ovarian cancer patients was determined. These 44 antigens listed below bound exclusively with serum IgGs derived from both late stage and

stage I ovarian cancer patients (including borderline histology) but not with serum IgG from normal control or patients with other gyenecological diseases. The grey colored boxes represent positive binding of phage

clones with patient's sera. TP: Total number of patients whose serum IgGs bound to each phage clone.

embedded image

for others 1:10000 serum dilution was used.

TABLE 5B

Binding of 21 Clones with Late Stage Ovarian Cancer Patient Sera

The binding of a panel of 21 clones with 22 Late Stage was determined on macroarrays. These 21 antigens listed below bound exclusively with serum IgGs derived from late stage

ovarian cancer patients but not with serum IgG from normal control of patients with other gyenecological diseases.

embedded image

all others were analyzed at a serum dilution of 1:10000;

TP: Total number of patients whose serum IgGs bound to each phage clone.

TABLE 6A

The mimotope sequences and the epitopes that are the real antigens that the antibodies were produced against based

on the amino acid sequence homology similarity (see below Region of similarity of AA).

Description of Stage I-IV clones. Size range of the Mimotopes ≧5 amino acids

Stage

Peptide sequences of
Size

Region of

(I-IV)
Description of the genes
Epitopes, in-frame with T7
of the

similarity
Antigen

clones
that are in-frame with T7 10B gene
10 B gene
peptide
Unigene #
of AA
expression in any type of cancer

1F6
gi|18490914|gb|BC022506.1|
AAWQAEEVLRQQKLADREKRAAE
49 AA
Hs.9222
165-213
Overexpressed in ovarian, nonsmall

Homo sapiens, estrogen receptor
QQRKKMEKEAQRLMKKEQNKIGV

cell lung carcinoma, pancreatic

binding site associated, antigen, 9
KLS

ductal cacinoma

RCAS
(SEQ ID NO: 11)

2B4
gi|22042983|ref|XM_032391.3|,
EKGGQEKQGEVIVSIEEKPKEVS
212 AA
Hs.446206
258-469
Expression levels are higher in

Homo sapiens similar to nuclear
EEQPVVTLEKQGTAVEVEAESLD

myelogenous leukemia and

autoantigenic sperm protein (histone-
PTVKPVDVGGDEPEEKVVTSENE

lymphoblastic leukemia cells.

binding)(NASP)
AGKAVLEQLVGQEVPPAEESPEV

TTEAAEASAVEAGSEVSEKPGQE

APVLPKDGAVNGPSVVGDQTPIE

PQTSIERLTETKDGSGLEEKVRA

KLVPSQEETKLSVEESEAAGDGV

DTKVAQGATEKSPEDKVQIAANE

ETQER

(SEQ ID NO: 12)

2F7
gi|22042983|ref|XM_032391.3|,
EKGGQEKQGEVIVSI
15 AA
Hs.446206
256-270
Expression levels are higher in

Homo sapiens similar to Nuclear
(SEQ ID NO: 13)

myelogenous leukemia and

autoantigenic sperm protein (NASP)

lymphoblastic leukemia cells.

2G4
gi|20987351|gb|BC030160.1|,
MADDLDFETGDAGASATFPMQCS
148 AA
Hs.310621
1-148
elF-5A2 sharing 82% identity of

Homo sapiens, eukaryotic translation
ALRKNGFVVLKGRPCKIVEMSTS

amino acid sequence with elF-5A,

initiation factor 5A
KTGKHGHAKVHLVGIDIFTGKKY

is a candidate oncogene related to

EDICPSTHNMDVPNIKRNDFQLI

development of ovarian cancer.

GIQDGYLSLLQDSGEVREDLRLP

EGDLGKEIEHKFDCGEQILITVL

SAMTEEAAVA

(SEQ ID NO: 14)

3A9
gi|4507212|ref|NM_003135.1|, Homo
QKTGGADQSLQQGEGSKKGKGKKKK
25 AA
Hs.2943
119-143
Transcript generated by alternative

sapiens signal recognition particle
(SEQ ID NO: 15)

splicing between exon 14 of the

19 kDa (SRP19)

Adenomatous polyposis coli gene and

SRP19 is observed and its expres-

sion is higher in Colorectal cancer

3C11
gi|16579884|ref|NM_000968.2|
ALQAKSDEKAAVAGKKPVVGKKGK
68 AA
Hs.186350
360-427
over-expression of L7a and L37

Homo sapiens ribosomal protein L4
KAAVGVKKQKKPLVGKKAAATKKP

mRNA is confirmed in prostate-

(RPL4)
SPEKKPAENKPTTEDNKPAA

cancer tissue samples.

(SEQ ID NO: 16)

4A11
gi|4557446|ref|NM_001270.1|, Homo
QQQQQQQHQASSNSGSEEDSSSSE
86 AA
Hs.311553
107-192
Not associated with cancer

sapiens chromodomain helicase DNA
DSDDSSSEVKRKKHKDEDWQMSGS

binding protein 1 (CHD1)
GSPSQSGSDSESEEEREKSSCDET

ESDYEPKNKVKSRK

(SEQ ID NO: 17)

4H3
gi|20543465|ref|XM_045343.5|,
PTKLPSINKSKDRASQQQQTNSIR
92 AA
Hs.25812
433-524
Three different mutations in NBS1

Homo sapiens Nijmegen breakage
NYFQPSTKKRERDEENQEMSSCKS

gene, generating truncated or

syndrome 1 (nibrin) (NBS1)
ARIETSCSLLEQTQPATPSLWKNK

aberrant NBS1 transcripts were

EQHLSENEPVDTNSDPNLFT

observed in different cancer cell

(SEQ ID NO: 18)

lines.

5C3
gi|20987351|gb|BC030160.1|,
MADDLDFETGDA
118 AA
Hs.310621
1-118
elF-5A2 sharing 82% identity of

Homo sapiens, eukaryotic translation
GASATFPMQCSALRKNGFVVLKGR

amino acid sequence with elF-5A,

initiation factor 5A
PCKIVEMSTSKTGKHGHAKVHLVG

is a candidate oncogene related to

IDIFTGKKYEDICPSTHNMDVPNI

development of ovarian cancer.

KRNDFQLIGIQDGYLSLLQDSGEV

REDLPLPEGD

(SEQ ID NO: 19)

5C11
gi|535176|emb|X74818.1|HSAHNAKRS,
PKFKMPDVHFKSPQISMSDIDLNL
121 AA
Hs.378738
393-512
Expression level of AHNAK is higher

H. sapiens mRNA of AHNAK-
KGPKIKGDMDISVPKLEGDLKGPK

in melanoma, promyelocytic leukemia

related sequence
VDVKGPKVGIDTPDIDIHGPEGKL

HL-60, osteosarcoma.

KGPKFKMPDLHLKAPKISMPEVDL

NLKGPKVKGDMDISLPKVEGDLKGP

(SEQ ID NO: 20)

5F8
gi|7662105|ref|NM_014711.1|,
GVCSSKVYVGKNTSEVKEDVVLGK
150 AA
Hs.279912
434-583
mRNA expression level of another

Homo sapiens KIAA0419 gene
SNQVCQSSGNHLENKVTHGLVTVE

antigen KIAA1416 is up-regulated in

product
GQLTSDERGAHIMNSTCAAMPKLH

colon cancer.

EPYASSQCIASPNFGTVSGLKPAS

MLEKNCSLQTELNKSYDVKNPSPL

LMQNQNXRQQMDTPMVSCGNEQFL

DNSFEK

(SEQ ID NO: 21)

1E12T
NM_006597.3, Homo sapiens heat
LESYAFNMKATVEDEKLQGKINDE
105 AA

shock 70 kDa protein 8 (HSPA8),
DKQKILDKCNEIINWLDKNQTAEK

transcript variant 1, mRNA
EEFEHQQKELEKVCNPIITKLYQS

AGGMPGGMPGGFPGGGAPPSGGAS

SGPTIEEVD

(SEQ ID NO: 90)

2A7
NM_003472.3, Homo sapiens DEK
EKKNKEESSDDEDKESEEEPPKKT
99 AA

oncogene (DNA binding) (DEK), mRNA
AKREKPKQKATSKSKKSVKSANVK

KADSSTTKKNQNSSKKESESEDSS

DDEPLIKKLKKPPTDEELKETIKK

LLA

(SEQ ID NO: 106)

3H3
NM_002967.2, Homo sapiens scaffold
DLRAELRKRNVDSSGNKSVLMERL
194 AA

attachment factor B (SAFB), mRNA
KKAIEDEGGNPDEIEITSEGNKKT

SKRSSKGRKPEEEGVEDNGLEENS

GDGQEDVETSLENLQDIDIMDISV

LDEAEIDNGSVADCVEDDDADNLQ

ESLSDSRELVEGEMKELPEQLQEH

AIEDKETINNLDTSSSDFTILQEIE

EPSLEPENEKILDILGESLRPHSSN

(SEQ ID NO: 135)

4A8
NM_003609.2, Homo sapiens HIRA
GIISSDGESN
10 AA

interacting protein 3 (HIRIP3), mRNA
(SEQ ID NO: 136)

4F2_1
NM_000122.1, Homo sapiens excision repair
LQDPVIRECRLRNSEGEATELITET
75 AA

cross-complementing rodent repair
FTSKSAISKTAESSGGPSTSRVTDP

deficiency, complementation group 3
QGKSDIPMDLFDFYEQMDKLAAALE

(xeroderma pigmentosum group B complementing)
(SEQ ID NO: 145)

(ERCC3), mRNA. Protein ID: NP_000113.1

5D4
sp|Q96JP5.1|ZFP91_HUMAN, Zinc
CGFTCRQKASLNWHMKKHDADSFY
74 AA

finger protein 91 homolog; Short = Zfp-91
QFSCNICGKKFEKKDSVVAHKAKSH

Length = 570
PEVLIAEALAANAGAQACGRTRVTS

(SEQ ID NO: 150)

65A6
NM_030920.2, Homo sapiens acidic
EEVGLSYLMKEEIQDEEDDDDYVE
55 AA

(leucine-rich) nuclear phosphoprotein 32
EGEEEEEEEEGGLRGEKRKRDAED

family, member E (ANP32E), mRNA
DGEEEDD

(SEQ ID NO: 159)

2H3
NM_006136.2, Homo sapiens capping
DWNKILSYKIGKEMQNA
17 AA

protein (actin filament) muscle Z-line,
(SEQ ID NO: 170)

alpha 2 (CAPZA2), mRNA

2C10
NM_001042483.1, Homo sapiens nuclear protein
ERKKRGARR
9 AA

1 (NUPR1), transcript variant 1, mRNA
(SEQ ID NO: 182)

The above sub-table shows antigens and not mimotopes, the sub-table below shows the mimotopes.

Size

of

Stage
Description of
Peptide sequences of
the
Description of the

Antigen expression

(I-IV)
the genes that are
Mimotopes, in-frame
pep-
sequences that

in any type of

clones
in Mimotope clones
with T7 10 B gene
tide
Mimotopes mimic
Unigene #
Region of similarity of AA
cancer

2H9
gi|21619682|gb|
ELLRT
5
gi|20139301|sp|Q9Y446|
Hs.148074
407-411
Immunohistochemical

BC032762.1|, Homo
(SEQ ID NO: 22)
AA
PKP3_HUMAN,

Score = 18.9 bits (37),
localization of

sapiens optineurin,

Plakophilin 3

Expect = 827
plakophilins (PKP1,

mRNA

Identities = 5/5 (100%),
PKP2, PKP3, and

Positives = 5/5 (100%)
p0071) in primary

Query^b:
1
ELLRT
5
oropharyngeal

ELLRT

tumors

Sbjct^c:
407
ELLRT
411

3B12
gi|21735624|ref|
GQTSM
5
gi|729143|sp|P38936|CDN1_HUMAN,
Hs.370771
144-147
mda-6 (p21) may

NM_145690.1|, Homo
(SEQ ID NO: 23)
AA
Cyclin-dependent kinase

Score = 16.8 bits (32),
function as a

sapiens tyrosine 3-

inhibitor 1 (p21) (CDK-

Expect = 3595
negative regulator

monooxygenase/trypto-

interacting protein1)

Identities = 4/4 (100%),
of melanoma growth,

phan 5-monooxygenase

(Melanoma differentiation

Positives = 4/4 (100%)
progression and

activation protein,

associated protein 6)

Query:
2
QTSM
5
metastasis

zeta polypeptide

(MDA-6)

QTSM

(YWHAZ), transcript

Sbjct:
144
QTSM
147

variant 2, mRNA.

5D8
gi|22024583|gb|
KKGPI
5
gi|20177863|sp|Q9BXJ2|
Hs.153714
102-106
TNF-alpha regulates

AC087376.5|, Homo
(SEQ ID NO: 24)
AA
CQT7_HUMAN, Complement-c1q

Score = 18.5 bits (36),
expression of down-

sapiens chromosome

and tumor necrosis factor-

Expect = 1109
stream components

11, clone RP11-

related protein 7 precursor

Identities = 5/5 (100%),
of complement system

230O19, complete

Positives = 5/5 (100%)
and plays a role in

sequence

Query:

KKGPI
5
energy homeostatis

KKGPI

where it is impli-

Sbjct:
102
KKGPI
106
cated in cachexia,

obesity and insulin

resistance.

4A4
gi|17028354|gb|
AKVIMR
6
gi|5921908|sp|O43174|
Hs.150595
138-142
all-trans-Retinoic

BC017483.1|BC017483,
(SEQ ID NO: 25)
AA
CP26_HUMAN|, Cytochrome P450 26

Score = 20.6 bits (41),
acid-induced ex-

Homo sapiens, clone

(Retinoic acid-metabolizing

Expect = 255
pression and regu-

IMAGE: 3506553, mRNA.

cytochrome) (P450RAI)

Identities = 5/5 (100%),
lation of retinoic

(hP450RAI) (Retinoic

Positives = 5/5 (100%)
acid 4-hydroxylase

acid 4-hydroxylase)

Query:
2
KVIMR
6
(CYP26) in human

KVIMR

promyelocytic

Sbjct:
138
KVIMR
142
leukemia

5A3
gi|15011541|gb|
YACLKD
6
gi|1170473|sp|P42575|ICE2_HUMAN,
Hs.433103
351-355
CASP-3, CASP-4,

AF397158.1|AF397158,
(SEQ ID NO: 26)
AA
Caspase-2 precursor

Score = 20.2 bits (40),
CASP-2 heterogene-

Homo sapiens clone 11

(CASP-2) (ICH-1 protease)

Expect = 342
ously coexpress in

pur alpha-associated

Identities = 5/5 (100%),
leukemic cell lines

ribosomal RNA gene,

Positives = 5/5 (100%)

partial sequence

Query:
1
YACLK
5

YACLK

Sbjct:
351
YACLK
355

2A3
gi|23271193|gb|
QILFMDP
7
gi|729597|sp|P39086|GLK1_HUMAN,
Hs.222405
242-246
Ionotropic and me-

BC036014.1|, Homo
(SEQ ID NO: 27)
AA
Glutamate receptor, ionotropic

Score = 21.4 bits (43),
tabotropic gluta-

sapiens poly(A)

kainate 1 precursor

Expect = 142
mate receptor pro-

polymerase alpha, mRNA

Identities = 5/5 (100%),
tein expression in

Positives = 5/5 (100%)
glioneuronal

Query:
1
QILFM
5
tumours from

QILFM

patients with in-

Sbjct:
242
QILFM
246
tractable epilepsy

4C10
gi|24756892|gb|
LNTVNTLI
8
gi|13633936|sp|Q9NPR2|
Hs.416077
440-445
Not associated with

AC008507.10|, Homo
(SEQ ID NO: 28)
AA
SM4B_HUMAN, Semaphorin 4B

Score = 21.8 bits (44),
cancer

sapiens chromosome 19

Expect = 106

clone CTC-448F2,

Identities = 6/6 (100%),

complete sequence

Positives = 6/6 (100%)

Query:
1
NTVNTL
6

NTVNTL

Sbjct:
440
NTVNTL
445

4D9
gi|21629397|gb|
GNSILLIA
8
gi|2842764|sp|Q99735|
Hs.81874
3-10
GST-pi has signifi-

AC099571.2|, Homo
(SEQ ID NO: 29)
AA
GST2_HUMAN, Microsomal

Score = 21.4 bits (43),
cance in the diag-

sapiens chromosome 1

glutathione S-transferase 2

Expect = 140
nosis of cancers as

clone RP11-438H8,

(Microsomal GST-2)

Identities = 7/8 (87%),
it is expressed

complete sequence

Positives = 7/8 (87%)
abundantly in tumor

Query:
1
GNSILLIA
8
cells.

GNSILL A

Sbjct:
3
GNSILLAA
10

2E11
gi|22004067|dbj|
WDLKSEYS
8
gi|1710146|sp|P49798|
Hs.386726
80-85
RGS4 is highly

AP005356.2|, Homo
(SEQ ID NO: 30)
AA
RGS4_HUMAN, Regulator of

Score = 21.8 bits (44),
expressed

sapiens genomic DNA,

G-protein signaling 4 (RGS4)

Expect = 106
in brain regions

chromosome 8q23,

(RGP4)

Identities = 6/6 (100%),
implicated in

clone: KB1198A4,

Positives = 6/6 (100%)
pathophysiology

complete sequence.

Query:
3
LKSEYS
8
of scizophrenia

LKSEYS

Sbjct:
80
LKSEYS
85

5G9
gi|20072204|gb|
PGCSTTLS
8
gi|14423962|sp|O94966|
Hs.255596
940-947
Ubiquitin carboxyl-

BC026241.1|, Homo
(SEQ ID NO: 31)
AA
UBPJ_HUMAN, Ubiquitin carboxyl-

Score = 18.9 bits (37),
terminal-hydrolase

sapiens ubiquitin-

terminal hydrolase 19

Expect = 827
L1 genes cause

protein isopeptide

Identities = 6/8 (75%),
autosomal

ligase (E3), mRNA

Positives = 7/8 (87%)
dominant familial

Query:
1
PGCSTTLS
8
Parkinson disease.

PGC+T LS

Sbjct:
940
PGCTTLLS
947

4H4
gi|20072204|gb|
PRCSTTLS
8
gi|6225843|sp|O60760|
Hs.128433
156-160
Lipocalin-type

BC026241.1|, Homo
(SEQ ID NO: 32)
AA
PGD2_HUMAN, Glutathione-

Score = 18.9 bits (37),
prostaglandin D syn-

sapiens ubiquitin-

requiring prostaglandin D

Expect = 827
thase (L-PGDS) has

protein isopeptide

synthase

Identities = 5/5 (100%),
recently been shown

ligase (E3), Mrna.

Positives = 5/5 (100%)
to be expressed in

Query:
3
CSTTL
7
human brain tumors,

CSTTL

breast tumors and

Sbjct:
156
CSTTL
160
in ovarian cancer.

2C7
gi|3152628|gb|
GDRSQLWRK
9
gi|24211441|sp|Q13443|
Hs.2442
720-725
Expression of ADAM-

AC004744.1|
(SEQ ID NO: 33)
AA
AD09_HUMAN, ADAM 9 precursor

Score = 20.2 bits (40),
9 mRNA and protein

AC004744, Homo sapiens

(A disintegrin and

Expect = 342
in human breast

BAC clone GS1-465N13

metalloproteinase domain 9)

Identities = 5/6 (83%),
cancer

from 7, complete

Positives = 5/6 (83%)

sequence

Query:
3
RSQLWR
8

R QLWR

Sbjct:
720
RDQLWR
725

4A3,
gi|16160856|ref|
KKQSSWYQI
9
gi|2498310|sp|Q12882|
Hs.1602
497-502
Higher DPD activity

2E1
XM_007763.5|,
(SEQ ID NO: 35)
AA
DPYD_HUMAN Dihydropyrimidine

Score = 21.8 bits (44),
in gastric cancer

Homo sapiens

dehydrogenase [NADP+] precursor

Expect = 106
is observed than in

myosin VA

(DPD)

Identities = 5/6 (83%),
colorectal cancer

(heavy polypeptide 12,

(DHPDHase)(Dihydrouracil

Positives = 6/6 (100%)

myoxin) (MYO5A), mRNA

dehydrogenase)

Query:
2
KQSSWY
7

(Dihydrothymine dehydrogenase)

KQ+SWY

Sbjct:
497
KQASWY
502

4G8
gi|15778776|gb|
PEGGTDASR
9
gi|13634077|sp|Q9Y493|
Hs.307004
1912-1919
zonadhesin func-

AC012363.6|,
(SEQ ID NO: 36)
AA
ZAN_HUMAN, Zonadhesin

Score = 18.9 bits (37),
tions during ferti-

Homo sapiens BAC

Expect = 827
lization to anchor

clone RP11-438O12

Identities = 6/8 (75%),
the acrosomal

from 2, complete

Positives = 7/8 (87%)
shroud to the zona

sequence

Query:
2
EGGTDASR
9
pellucida

EGGT+A R

Sbjct:
1912
EGGTEAFR
1919

2E10
gi|20521965|dbj|
ASFTLKLQS
9
gi|6226869|sp|P34932|
Hs.90093
647-653
Expression of HSP70

AB051476.2|,
(SEQ ID NO: 37)
AA
HS74_HUMAN, HEAT SHOCK 70 KDA

Score = 21.8 bits (44),
is observed in

Homo sapiens mRNA for

PROTEIN 4 (HEAT SHOCK

Expect = 106
human hepato-

KIAA1689 protein,

70-RELATED PROTEIN APG-2)

Identities = 6/7 (85%),
cellular carcinoma

partial cds

Positives = 7/7 (100%)

Query:
2
SFTLKLQ
8

SFTLKL+

Sbjct:
647
SFTLKLE
653

2D1
gi|4504522|ref|
GGGSNGRTSV
10
gi|20137621|sp|O95071|
Hs.94262
140-148
EDD, the human

NM_002157.1|, Homo
(SEQ ID NO: 38)
AA
EDD_HUMAN,

Score = 21.8 bits (44),
orthologue of the

sapiens heat shock

Ubiquitin--protein ligase

Expect = 105
hyperplastic discs

10 kDa protein 1

EDD (Hyperplastic discs

Identities = 7/9 (77%),
tumour suppressor

(chaperonin 10)

protein homolog)(hHYD)

Positives = 9/9 (100%)
gene, is amplified

(HSPE1), mRNA

(Progestin induced

Query:
1
GGGSNGRTS
9
and overexpressed

protein)

GGGS+GR+S

in cancer

Sbjct:
140
GGGSSGRSS
148

5H6
gi|40849829|gb|
NSFLMTSSKPR
11
gi|12643618|sp|O60242|
Hs.334087
694-699
BAI1 expression in-

AAR95625|
(SEQ ID NO: 39)
AA
BAI3_HUMAN, Brain-

Score = 20.6 bits (41),
hibit stromal

NADH dehydrogenase

specific angiogenesis

Expect = 254
vascularization

subunit 4

inhibitor 3 precursor

Identities = 5/6 (83%),
in pulmonary

Positives = 6/6 (100%)
adenocarcinoma

Query:
1
NSFLMT
6

NS+LMT

Sbjct:
694
NSYLMT
699

2C1
gi|23958536|gb|
ACSSTVSFIWI
11
gi|33112422|sp|Q16827|
Hs.160871
623-629
Functional involve-

BC036216.1|,
(SEQ ID NO: 40)
AA
PTPO_HUMAN Receptor-type

Score = 21.8 bits (44),
ment of PTP-U2L in

Homo sapiens

protein-tyrosine phosphatase O

Expect = 128
apoptosis subse-

cullin 4B, mRNA

precursor (Glomerular

Identities = 6/7 (85%),
quent to terminal

epithelial protein 1)

Positives = 7/7 (100%)
differentiation of

(Protein tyrosine phosphatase

Query:
3
SSTVSFI
9
monoblastoid

U2)(PTPase U2) (PTP-U2)

SST+SFI

leukemia cells

Sbjct:
623
SSTISFI
629

2G2
gi|25988997|gb|
KKKKKKKRVGGPLQ
14
gi|20532388|sp|Q9NVP1|
Hs.363492
108-115
The expression of

AF541939.1|,
(SEQ ID NO: 41)
AA
DD18_HUMAN, ATP-dependent RNA

Score = 27.4 bits (57),
MrDb is induced upon

His-3 integration

helicase DDX18

Expect = 2.8
proliferative stimu-

vector pJHAM007,

(DEAD-box protein 18)

Identities = 8/8 (100%),
lation of primary

complete sequence

(Myc-regulated DEAD-box protein)

Positives = 8/8 (100%)
human fibroblasts as

(MrDb)

Query:
1
KKKKKKKR
8
well as B cells and

KKKKKKKR

down-regulated

Sbjct:
108
KKKKKKKR
115
during terminal

differentiation of

HL60 leukemia cells

4G9
gi|17136149|ref|
GPVFICSSNCFKIT
14
gi|115892|sp|P16870|CBPH_HUMAN,
Hs.75360
333-340
Expression of the

NM_014708.2|, Homo
(SEQ ID NO: 42)
AA
Carboxypeptidase H precursor

Score = 24.4 bits (50),
protein product of

sapiens kinetochore

(CPH) (Carboxypeptidase E)

Expect = 18
the PCPH proto-

associated 1 (KNTC1),

(CPE) (Enkephalin convertase)

Identities = 7/8 (87%),
oncogene in human

mRNA

(Prohormone processing

Positives = 8/8 (100%)
tumor cell lines

carboxypeptidase)

Query:
7
SSNCFKIT
14

SSNCF+IT

Sbjct:
333
SSNCFEIT
340

2E12
gi|22062543|ref|
APFTCWPTVATNTWE
15
gi|128062|sp|P08473|NEP_HUMAN,
Hs.307734
167-175
Loss or decrease in

XM_170670.1|, Homo
(SEQ ID NO: 43)
AA
Neprilysin (Neutral

Score = 23.5 bits (48),
expression of NEP

sapiens putative

endopeptidase) (NEP)

Expect = 32
has been reported

transmembrane protein;

(Enkephalinase) (Common acute

Identities = 7/10 (70%),
in brain cancer,

homolog of yeast

lymphocytic leukemia antigen)

Positives = 7/10 (70%),
renal cancer and

Golgi membrane protein

(CALLA)

Gaps = 1/10 (10%)
invasive bladder

Yif1p (Yip1p-

(Neutral endopeptidase 24.11)

Query:
6
WPTVATNTWE
15
cancer.

interacting factor)

(CD10)

WP VAT WE

(54TM), mRNA.

Sbjct:
167
WP-VATENWE
175

1B5
gi|12654862|gb|
TDQSSISPGNRKAPG
15
gi|6707734|sp|Q13077|
Hs.531251
64-70
Tumor necrosis fac-

BC001275.1|BC001275,
(SEQ ID NO: 44)
AA
TRA1_HUMAN, TNF

Score = 21.0 bits (42),
tor receptor-asso-

Homo sapiens annexin

receptor associated

Expect = 187
ciated factor 1

A1, mRNA

factor 1 (TRAF1)

Identities = 6/7 (85%),
gene overexpression

Positives = 7/7 (100%)
in B-cell chronic

Query:
5
SISPGNR
11
lymphocytic

SISPG+R

leukemia

Sbjct:
64
SISPGSR
70

4B2
gi|23272851|gb|
RIMGGGIQRETWISS
15
gi|20139133|sp|Q9BZF3|
Hs.318775
906-912
Oxysterols are

BC035645.1|,
(SEQ ID NO: 45)
AA
ORP6_HUMAN,

Score = 21.8 bits (44),
potent signalling

Homo sapiens, Similar

Oxysterol binding

Expect = 104
lipids that directly

to RIKEN cDNA

protein-related protein 6

Identities = 5/7 (71%),
bind liver X recep-

3830613O22 gene, clone

Positives = 6/7 (85%)
tors (LXRs).

Query:
8
QRETWIS
14
Oxysterol-regulated

QRE W+S

function of LXRs is

Sbjct:
906
QREAWVS
912
to control the

expression of genes

involved in reverse

cholesterol trans-

port, catabolism of

cholesterol, and

lipogenesis

5C6
gi|22797897|emb|
ICGSWGKYNLWQSSSSK
17
gi|12644310|sp|P53618|
Hs.3059
250-257
A major component

AL160171.27|, Human
(SEQ ID NO: 46)
AA
COPB_HUMAN, Coatomer beta

Score = 22.3 bits (45),
of the coat of non-

DNA sequence from

subunit (Beta-coat protein)

Expect = 93
clathrin-coated

clone RP11-256E16

(Beta-COP)

Identities = 7/8 (87%),
vesicles, beta-

on chromosome 1,

Positives = 7/8 (87%)
COP, mediate

complete sequence

Query:
8
YNLWQSSS
15
membrane traffic

YNL QSSS

through the Golgi

Sbjct:
250
YNLLQSSS
257
complex

3C8
gi|24234687|ref|
EILKPEGQHMKLRSEETS
18
gi|2493676|sp|Q12889|
Hs.1154
585-599
Oviduct specific

NM_004134.3|, Homo
(SEQ ID NO: 47)
AA
OGP_HUMAN,

Score = 24.4 bits (50),
glycoproteins are

sapiens heat shock

Oviduct-specific

Expect = 21
involved in variety

70 kDa protein 9B

glycoprotein precursor

Identities = 10/15 (66%),
of roles during

(mortalin-2) (HSPA9B),

(Oviductal glycoprotein)

Positives = 10/15 (66%),
fertilization and

nuclear gene encoding

Gaps = 1/15 (6%)
early embryonic

mitochondrial

Query:
5
PEGQHMKLRSE
18
development

protein, mRNA.

E-TS

PEGQ M LR

E TS

Sbjct:
585
PEGQTM
599

PLRGENLTS

1H1
gi|22024587|gb|
AKARALARRSEPCSTGKLQLR
21
gi|12230848|sp|O95049|
Hs.25527
853-862
Occludin expression

AC103702.3|, Homo
(SEQ ID NO: 48)
AA
ZO3_HUMAN, Tight

Score = 23.5 bits (48),
in microvessels of

sapiens chromosome 17,

junction protein ZO-3

Expect = 38
neoplastic and non-

clone RP11-357H14,

(Zonula occludens 3 protein)

Identities = 8/10 (80%),
neoplastic human

complete sequence

Positives = 8/10 (80%)
brain

Query:
3
ARALARRSEP
12

A ALAR SEP

Sbjct:
853
APALARSSEP
862

2F10
gi|21166212|gb|
VQRGIGTIPSETIPVNRKRVNPP
23
gi|118206|sp|P14416|D2DR_HUMAN,
Hs.73893
264-270
Expression of dopa-

AC109584.2|,
(SEQ ID NO: 49)
AA
D(2) dopamine receptor

Score = 22.7 bits (46),
mine receptors and

Homo sapiens

Expect = 56
transporter in

chromosome 3

Identities = 6/7 (85%),
neuroendocrine

clone RP11-674P14,

Positives = 7/7 (100%)
gastrointestinal

complete sequence.

Query:
14
PVNRKRV
20
tumor cells

PVNR+RV

Sbjct:
264
PVNRRRV
270

5C12
gi|24430032|emb|
VSWFPSWARSCGRQTPLGATYK
28
gi|3915660|sp|Q16850|
Hs.512872
283-292
CYP2E1 protein is

AL939123.1|SCO939123,
DTLLPV
AA
CP51_HUMAN, Cytochrome

Score = 24.4 bits (50),
expressed in both

Streptomyces

(SEQ ID NO: 50)

P450 51A1 (CYPLI) (P450LI)

Expect = 17
tumour and normal

coelicolor A3(2)

(Sterol 14-alpha demethylase)

Identities = 8/10 (80%),
breast tissue with

complete genome;

(Lanosterol 14-alpha

Positives = 8/10 (80%)
an increased

segment 20/29

demethylase) (LDM) (P450-14DM)

Query:
14
QTPLGATYKD
23
expression in

QT L ATYKD

breast tumours.

Sbjct:
283
QTLLDATYKD
292

2H5
gi|18606292|gb|
DLQPPGRRWLPQQCPGSPGRC
35
gi|116006|sp|P08575|CD45_HUMAN,
Hs.444324
40-49
Expression of

BC022865.1|
DASVPLWSDHLPSL
AA
Leukocyte common

Score = 23.5 bits (48),
leucocyte-common

Homo sapiens ATP
(SEQ ID NO: 51)

antigen precursor (L-CA)

Expect = 41
antigen and large

synthase, H+ trans-

Identities = 8/10 (80%),
sialoglycoprotein

porting, mitochondrial

Positives = 8/10 (80%)
on leukemic cells

F1 complex, O subunit

Query:
24
SVPLWSDHLP
33
in B-cell chronic

(oligomycin sensi-

SVPL SD LP

lymphocytic

tivity conferring

Sbjct:
40
SVPLSSDPLP
49
leukemia and non-

protein), mRNA

Hodgkin's

2F12
gi|10443350|emb|
RGLGPLAAACGRSGGGGGGG
36
gi|8928460|sp|O75962|
Hs.519209
2232-2244
Not associated with

AL133264.10|
AGGTGSSNVNKKTPPN
AA
TRIO_HUMAN, Triple functional

Score = 31.2 bits (66),
cancer

AL133264, Human DNA
(SEQ ID NO: 52)

domain protein (PTPRF

Expect = 0.22

sequence from clone

interacting protein)

Identities = 11/13 (84%),

RP3-369A17 on

Positives = 13/13 (100%)

chromosome 6p22.1-22.3

Query:
13
SGGGGGGGAGG
25

Contains ESTs, STSs,

TG

GSSs and CpG islands

SGGGGGGG+G

G+G

Sbjct:
2232
SGGGGGGGSGG
2244

SG

5C9
gi|15072584|emb|
PMRCSCTMGEIQMQIHCGARRR
37
gi|34395516|sp|O15085|
Hs.371602
409-417
A novel gene at

AL442003.8|, Human
KAVPSSKDNVQSSAH
AA
ARHB_HUMAN, Rho guanine

Score = 23.1 bits (47),
11q23 named LARG

DNA sequence from
(SEQ ID NO: 53)

nucleotide exchange factor 11

Expect = 72
for leukemia-asso-

clone RP11-324H6 on

(PDZ-RhoGEF)

Identities = 7/9 (77%),
ciated Rho guanine

chromosome 10,

Positives = 7/9 (77%)
nucleotide exchange

complete sequence

Query:
8
MGEIQMQIH
16
factor (GEF) has

M EIQ QIH

strong sequence

Sbjct:
409
MPEIQEQIH
417
homology to several

members of the Rho

family of GEFs.

Further, LARG was

found to be fused

with MLL in a pa-

tient with primary

Rho GEF, Bcr, has

been implicated in

leukemia through a

recurrent chromo-

somal

translocation.

5F9
gi|18693518|gb|
WRTTYISILNLAQFYYSLITVL
48
gi|20139105|sp|Q99959|
Hs.25051
471-492
Immunohistochemical

AC015911.8|,
KTFNWPGTVVHACNPSTLGGQG
AA
PKP2_HUMAN,

Score = 47.4 bits (111),
localization of

Homo sapiens

RRIT

Plakophilin 2

Expect = 5e−06
plakophilins (PKP1,

chromosome 17,
(SEQ ID NO: 54)

Identities = 19/22 (86%),
PKP2, PKP3, and

clone RP11-1094M14,

Positives = 19/22 (86%)
p0071) in primary

complete sequence

Query:
27
WPGTVVHACNP
48
oropharyngeal

STLGGQ

tumors

GRRIT WPG V

HACNPSTLGGQ

G RIT

Sbjct:
471
WPGAVAHACNP
492

STLGGQ

GGRIT

1A3
QDSCQEN
7AA

(SEQ ID NO: 76)

1A4
PAYLGAHFSLPR
12

(SEQ ID NO: 77)
AA

1A9
LNLYRRHFSRD
11

(SEQ ID NO: 78)
AA

1B12
PHTKAKIFVNANNMQNTEL
19

(SEQ ID NO: 79)
AA

1B3
RSGRDNGDVGAGAPFRLSSTS
62

QPRRIKPIAPPPRAPSPECGA
AA

GGGGPAPAGWKGSKLAAALE

(SEQ ID NO: 80)

1B4b
ENVLVQTN
8 AA

(SEQ ID NO: 81)

2B2
SGRDNGDVGAGAPFRLSSTSQ
52

PRRIKPIAPPPRAPSPECGAG
AA

GGGGGRGGGG

(SEQ ID NO: 82)

1C4
TQSLTDFR
8 AA

(SEQ ID NO: 83)

1C8
VGKRKNGCCQSSRIYGKEPLPY
28

KLSHFP
AA

(SEQ ID NO: 84)

1D1
GGWRAGAGAGAGVRVGPRVG
31

EAGPEARMRGG
AA

(SEQ ID NO: 85)

1D10
LTNKSLHYGMIERENNSLYINNS
23

(SEQ ID NO: 86)
AA

1D4
RKRRERVGRQT
11

(SEQ ID NO: 87)
AA

1D8
RSGRPRVEGEQACGRTRVTS
20

(SEQ ID NO: 88)
AA

1E1
AKSWTN
6 AA

(SEQ ID NO: 89)

1E12B
LIQHQHLGQI
10

(SEQ ID NO: 91)
AA

1E2
RMSPH
5 AA

(SEQ ID NO: 92)

1E4T
VVTHSATLTSSPPAPSSFVCPQ
40

ASRWLLSISELGEASSGN
AA

(SEQ ID NO: 93

1E4B
RSGRDNGDVGAGAPFRLSSTS
51

QPRRIKPIAPPPRAPSPECGA
AA

GGSLRPHSE

(SEQ ID NO: 94)

1F2
RSGRDNGDVGAGAPFRLSSTS
71

QPRRIKPIAAPSARCPPPSAG
AA

AGRRLAAGRGWKGIKLAVGFY

NYFTGLCL

(SEQ ID NO: 95)

1F11
LMRNLTMRLMTGMSTRSSLSP
44

RHHITCAGTQGGTAQATTPRV
AA

PR

(SEQ ID NO: 96)

1F12
RGSEIFLTAMNCSHVREET
19

(SEQ ID NO: 97)
AA

1F4
AAGRGRGK
8 AA

(SEQ ID NO: 98)

1F10
SGRDNGDVGAGAPFRLSSTSQ
77

PRRIKPIAPPPRAPSPECGAG
AA

GGGWRPRRRRRRPRRRRRWML

MLLLMMMMVDRGNL

(SEQ ID NO: 99)

1G4
SGRDNGDVGAGAPFRLSSTSQ
63

PRRIKPIAPPPRAPSPECGAG
AA

RRLAAAEEEEEDAPEEDVLEV

(SEQ ID NO: 100)

1H8
ERKSCS
6 AA

(SEQ ID NO: 101)

1H9
ILLKTIFAYSCSE
13

(SEQ ID NO: 102)
AA

2A2
GSFETSSLPSDASSLCR
17

(SEQ ID NO: 103)
AA

2A5m
VRLWSW
6 AA

(SEQ ID NO: 104)

2A6
QEHDCGAAADGLAHLSDCGA
20

(SEQ ID NO: 105)
AA

2C6
LGAGGEGRRIPPP
13

(SEQ ID NO: 107)
AA

2D10
KRASKCKWL
9 AA

(SEQ ID NO: 108)

2E2
RSGRDNGDVGAGGRGASLRPH
24

SSN
AA

(SEQ ID NO: 109)

2F4
CSETQAWRPLLRPAR
15

(SEQ ID NO: 110)
AA

2F8
SGRDNGDVGAGAPFRLSSTSQ
71

PRRIKPIAPPPRAPSPECGAG
AA

GGGGGRGGGGGGPGGGGVGGR

GGGGGGRG

(SEQ ID NO: 111)

2G9
QKQKKANEKKEEPK
14

(SEQ ID NO: 112)
AA

2H1
LGSDERRHRAP
11

(SEQ ID NO: 113)
AA

3A1
RRGRCKPSRRWHLNN
15

(SEQ ID NO: 114)
AA

3A10
LVCATSNF
8 AA

(SEQ ID NO: 115)

3A11
FGCKSLLL
8 AA

(SEQ ID NO: 116)

3A12
PPSPPP
6 AA

(SEQ ID NO: 117)

3A3b
LNYQMKG
7 AA

(SEQ ID NO: 118)

3A5
VEPKREK
7 AA

(SEQ ID NO: 119)

3A7
PKSGHAQTELTRPDRLPFQVS
21

(SEQ ID NO: 120)
AA

3B2b
LQDPVIRECRLRNSEGEATE
46

LITETFTSKSAISKTAESSG
AA

GPSTSR

(SEQ ID NO: 121)

3B6
GGRRWERGKQKTQAAE
16

(SEQ ID NO: 122)
AA

3D11
LSVGPACAVSSGNETVLSTTTP
31

ASTTLRCIS
AA

(SEQ ID NO: 123)

3D5T
VDEEDMMNQVLQRSIIDQ
18

(SEQ ID NO: 124)
AA

3D7
VQAQQRSAPARAARAGHPEAG
28

AGMEGAG
AA

(SEQ ID NO: 125)

3E1
GERVSSAGGTAHGGRAGLSTRR
22

(SEQ ID NO: 126)
AA

3E10b
EGRLQDHRRRP
11

(SEQ ID NO: 127)
AA

3E7
LLFLIN
6 AA

(SEQ ID NO: 128)

3F1
SKRNKPACSKWLSWYCNE
18

(SEQ ID NO: 129)
AA

3F9T
YKIIYVVYCQKWKKPHHEET
40

FRKPKLMNILKIYLSVKTKL
AA

(SEQ ID NO: 130)

3G1
GKIALSSVRTQNLLSFQALHKNV
23

(SEQ ID NO: 131)
AA

3G11T
GLCGPDPSTGRLPRRFRPAAS
26

GQPWP
AA

(SEQ ID NO: 132)

3G12T
KMQMNAYFLDKKSAKMVSV
19

(SEQ ID NO: 133)
AA

3G3
SQRPPQGSQLPLPASPETATAP
27

RKVSG
AA

(SEQ ID NO: 134)

4B5
NKKPLGSSVEVL
12

(SEQ ID NO: 137)
AA

4B6T
LPQCPNIG SL
10

(SEQ ID NO: 138)
AA

4B7
EVYAQREDLVDEIKL
24

PKGEPLFFC
AA

(SEQ ID NO: 139)

4C4
LNRNAI
6 AA

(SEQ ID NO: 140)

4C6b
PSNLINFFKVLTLLSRSR
18

(SEQ ID NO: 141)
AA

4E1
LHYHGRAAPRAATRPG
16

(SEQ ID NO: 142)
AA

4E8
PKTMTQNSFG
10

(SEQ ID NO: 143)
AA

4F10
DRQEEETSIKVLVLERSWNLHT
25

LGP
AA

(SEQ ID NO: 144)

4F2_3
PLPPSPKPIKIKNYNKP
17

(SEQ ID NO: 146)
AA

4F4
GTATELPHRRTNKRKRLG
18

(SEQ ID NO: 147)
AA

4F8
EVDVRREDLVEEIKR
24

RTGQPLCIC
AA

(SEQ ID NO: 148)

5A1
QQPGAGLPNEP
11

(SEQ ID NO: 149)
AA

5H10
ENLEI
6 AA

(SEQ ID NO: 151)

5H2
GRGDIPEIHTEVQQDCH
17

(SEQ ID NO: 152)
AA

4C9
KKRRNMLKTL
10

(SEQ ID NO: 153)
AA

4D12
PARPAREEEARRAV
28

SHAGVVAAAETAGP
AA

(SEQ ID NO: 154)

2D4
GGSSRQRDGGGAGAGGGGRA
33

GGSGPQLPRQPAG
AA

(SEQ ID NO: 155)

4A1
APAWVTEQDSDPKKKK......

.............*

cDNA insert sequence is

the region comprising

of stretch of

nucleotides followed by

poly A tail, therefore

the translated peptides

has endless number of

lysine. Western blot

can determine the mol wt

of these peptides.

(SEQ ID NO: 156)

4D7
MKRIQKKESHYLN
13

(SEQ ID NO: 157)
AA

4D10
AWWLMPTVPATWE
28

AEAGGSLEPRSQRLQ
AA

(SEQ ID NO: 158)

3G6
APRRTSEDGRAAQPRGAKTKA
33

TGAQAGGRAQAP
AA

(SEQ ID NO: 160)

2C5
RKTRYFI
7 AA

(SEQ ID NO: 161)

2G3
INKRRSFYNLSNWQ
14

(SEQ ID NO: 162)
AA

3G5
RWLEITKYIDQ
11

(SEQ ID NO: 163)
AA

4H4
KKKGGGGEGGGAGI
14

(SEQ ID NO: 164)
AA

4E8
GRNGKGEKGK
10

(SEQ ID NO: 165)
AA

1H3
RKDIKAFYYLH
11

(SEQ ID NO: 166)
AA

2B10
LWSEINIKGRGEKEQQGRDTYI
26

GLKR
AA

(SEQ ID NO: 167)

2C120
NWQKMTAY
8 AA

(SEQ ID NO: 168)

2F6
RRMAFFRL
8 AA

(SEQ ID NO: 169)

3A50
DWGYIRGSRLSN
12

(SEQ ID NO: 171)
AA

3B4
AWWRMPVIPATWEAGAGEPLE
28

PRKRSLQ
AA

(SEQ ID NO: 172)

3C7
RWSRVRSWQRPQALETEETHR
24

GRG
AA

(SEQ ID NO: 173)

3G4
LWHRIRNSEESKPGCNEVSLQ
31

QHALLGSRME
AA

(SEQ ID NO: 174)

4E5
PKGRRMGFFF
10

(SEQ ID NO: 175)
AA

1F120
IQQKSGNGLPKTDRPG
16

(SEQ ID NO: 176)
AA

1H10
LGCSTGEVPGRPCSRHSTSSIA
37

AVAGPGAAGGGGAGG
AA

(SEQ ID NO: 177)

2G6
ASQDIRKRISQGGKG
27

VNSRPTTYGCSG
AA

(SEQ ID NO: 178)

4C60
NRIRYPGSPRRKR
13

(SEQ ID NO: 179)
AA

4F7
LPKCWDYRREPPYPADNS
18

(SEQ ID NO: 180)
AA

1F5
IPWVVVHGRS
10

(SEQ ID NO: 181)
AA

2E110
EIYNYQVTP
9 AA

(SEQ ID NO: 183)

2F120
GDVGEMLLVMRNPANRLPAAR
38

RLMGFSRVGFSFGIFFR
AA

(SEQ ID NO: 184)

2F9
RKSESDSS
8 AA

(SEQ ID NO: 185)

4H5
NSSTDSCHRKSYT
13

(SEQ ID NO: 186)
AA

TABLE 6B

Description of Stage II-IV clones.

Size range of the Mimotopes ≧5 amino acids

Peptide

sequences of

Stage

Epitopes in-
Size of

Serex
Region of

(II-IV)
Description of the genes that are in-
frame with T7
the

Y/N
similarity
Antigen expression in any

clones
frame with T7 10B gene
10 B gene
peptide
Unigene #
mRNA
of AA
type of cancer

3H1
gi|12654010|gb|BC000805.1|
DDDSDYGSSK
103
Hs.510265
N
140-243
The expressions of casein kinase II

Homo sapiens nuclear ubiquitous
KKNXKMVKKS
AA

(CK2) is higher in neoplastic ovarian

casein kinase and cyclin-dependent
KPERKEKKMP

surface epithelium.

kinase substrate, mRNA
KPRLKATVTPS

Casein kinase II (CK II) is expressed at a

PVKGKGKVGR

higher level in lung tumours.

PTASKASKEKT

PSPKEEDEEPE

SPPEKKTSISP

PPEKSGDEGS

EDEAPSGED

(SEQ ID NO: 55)

2B3
gi|7023439|dbj|AK001891.1|,
LSTSSFDEQN
10 AA
Hs.528654
N
350-360
Not associated with cancer

Homo sapiens cDNA FLJ11029 fis,
(SEQ ID NO: 56)

clone PLACE1004156

The above sub-table shows antigens and not mimotopes, the sub-table below shows the mimotopes.

Peptide

sequences of

Stage

the Mimotopes

Antigen

(II-
Description of the
that are in-
Size of
Description of the

expression in

IV)
genes that are in
frame with T7
the
sequences that

any type of

clones
Mimotope clones
10 B gene
peptide
Mimotopes mimic
Unigene #
Region of similarity of AA
cancer

2B9
gi|28837315|gb|BC047588.1|
VIVVLIAVISF
18 AA
gi|20141211|sp|P18825|
Hs.123022
172-185
Stimulation of

Homo

PQNYTWL

A2AC_HUMAN,

Score = 24.8 bits (51), Expect =
alpha2-adrenergic

sapiens KIAA1363
(SEQ ID NO:

Alpha-2C-

16
receptor inhibits

protein, mRNA
57)

adrenergic receptor

Identities = 10/14 (71%),
cholangiocarcinoma

(Alpha-2C

Positives = 10/14 (71%),
growth through

adrenoceptor)

Gaps = 3/14 (21%)
modulation of

(Subtype C4)

Query:
2
IVV----LI-AVISFP
12
Raf-1 and B-Raf

IV LI AVISFP

activities. Beta

Sbjct:
172
IVAVWLISAVISFP
185
adrenergic

receptor is

overexpressed

in pulmonary

adenocarcinoma

2C12
gi|15072584|emb|AL442003.8|,
PMRCSCTMG
37 AA
gi|2851534|sp|Q13724|
Hs.516120
4-12
Not associated

Human
EIQMQIHCGA

GCS1_HUMAN

Score = 24.0 bits (49), Expect =
with cancer

DNA sequence from
RRRKAVPSS

Mannosyl-

40

clone RP11-324H6 on
KDNVQSSAH

oligosaccharide

Identities = 7/9 (77%),

chromosome
(SEQ ID NO:

glucosidase

Positives = 8/9 (88%)

10, complete
58)

(Processing A-

Query:
18
GARRRKAVP
26

sequence

glucosidase I)

G RRR+AVP

Sbjct:
4
GERRRRAVP
12

2D7
gi|34882281|ref|XM_236768.2|,
LRGTSGVQP
14 AA
gi|32172435|sp|P46 934|
Hs.1565
514-520
RING protein

Rattus

PEIEQ (SEQ

NED4_HUMAN,

Score = 22.7 bits (46), Expect =
Trim32

norvegicus

ID NO: 59)

Ubiquitin-protein

70
associated with

hypothetical

ligase Nedd-4

Identities = 6/7 (85%),
skin

LOC316116

Positives = 6/7 (85%)
carcinogenesis

(LOC316116), mRNA

Query:
8
QPPEIEQ
14
has E3-ubiquitin

QP EIEQ

ligase properties

Sbjct:
514
QPSEIEQ
520

2D12
gi|34783327|gb|BC022049.2|,
ILHLH (SEQ
5 AA
gi|128616|sp|P23975|
Hs.78036
218-222
NET is involved

Homo

ID NO: 60)

S6A2_HUMAN,

Score = 17.6 bits (34), Expect =
in

sapiens cDNA clone

Sodium-dependent

2425
neurotransmitter

IMAGE: 4291567,

noradrenaline

Identities = 4/5 (80%),
removal from

partial cds

transporter

Positives = 5/5 (100%)
neuronal

(Norepinephrine

Query:
1
ILHLH
5
synapses

transporter) (NET)

+LHLH

Sbjct:
218
VLHLH
222

2E7
gi|6330364|dbj|AB033020.1|,
VLSALPEKNC
32 AA
gi|34395825|sp|Q9H106|
Hs.339789
167-174
Protein-tyrosine

Homo

NTVPFQPPE

PTL2_HUMAN

Score = 24.4 bits (50), Expect =
phosphatase

sapiens mRNA for
DLRYQHCSS

Protein tyrosine

21
(SAP-1) is

KIAA1194 protein
RFLE (SEQ ID

phosphatase non-

Identities = 7/8 (87%),
overexpressed

NO: 61)

receptor type

Positives = 8/8 (100%)
in

substrate 1-like 2

Query:
2
LSALPEKN
9
gastrointestinal

precursor

LSALPE+ N

cancer

Sbjct:
167
LSALPERN
174

2G10
gi|16307467|gb|BC010282.1|,
WGFNERDRL
20 AA
gi|13638201|sp|P41214|
Hs.274151
523-528
CD15 and CD50

Homo

SSILQQRCVT

LIGA_HUMAN,

Score = 24.0 bits (49), Expect =
antigens are

sapiens leucine-rich
L (SEQ ID

Ligatin

29
both

PPR-motif containing,
NO: 62)

(Hepatocellular

Identities = 6/6 (100%),
overexpressed

mRNA

carcinoma-

Positives = 6/6 (100%)
in

associated antigen

Query:
12
ILQQRC
17
hepatocarcinoma.

56)

ILQQRC

Sbjct:
523
ILQQRC
528

2G11
gi|7329921|emb|AL117379.14|
VVSGFFSTFS
11 AA
gi|1705762|sp|P13569|
Hs.521149
429-435
Mutation of

HSJ563E14,
L (SEQ ID

CFTR_HUMAN,

Score = 21.8 bits (44), Expect =
CFTR is

Human DNA sequence from clone
NO: 63)

Cystic fibrosis

128
observed in

RP4-563E14 on chromosome 20

transmembrane

Identities = 6/7 (85%),
Cystic Fibrosis

Contains the 5′ of the DATF1

conductance

Positives = 6/7 (85%)

gene encoding the death

regulator (CFTR)

Query:
5
FFSTFSL
11

associated transcription

FFS FSL

factor 1, the 5′ end of a

Sbjct:
429
FFSNFSL
435

novel gene, ESTs, STSs,

GSSs and four CpG

islands, complete sequence.

2H8
gi|5714635|gb|AF159295.1|
LTRPGHGQD
9 AA
gi|2499758|sp|Q92729|
Hs.19718
355-361
A potential role

AF159295,
(SEQ ID NO:

PTPU_HUMAN,

Score = 19.3 bits (38), Expect =
of PCP-2 in cell-

Homo sapiens

64)

Receptor-type

748
cell recognition

serine/threonine

protein-tyrosine

Identities = 6/7 (85%),
and adhesion is

protein kinase Kp78

phosphatase U

Positives = 6/7 (85%)
supported by its

splice variant

precursor (R-PTP-

Query:
1
LTRPGHG
7
co-localization

CTAK75a mRNA

U)(Protein-tyrosine

LTRPG G

with cell adhesion

phosphatase J)

Sbjct:
355
LTRPGDG
361
molecules, such as

(PTP-J) (Pancreatic

catenin and E-

carcinoma

cadherin, at sites

phosphatase 2)

of cell-cell

PCP-2

contact.

4C5
gi|19683998|gb|BC025957.1|
LYINEMKSKK
11 AA
gi|417216|sp|P33176|
Hs.512922
592-599
Kinesin-1 links

Homo sapiens coated
L (SEQ ID

KINH_HUMAN,

Score = 22.3 bits (45), Expect =
neurofibromin

vesicle membrane
NO: 65)

Kinesin heavy chain

95
and merlin in a

protein, mRNA

(Ubiquitous kinesin

Identities = 6/8 (75%),
common cellular

heavy chain)

Positives = 8/8 (100%)
pathway of

(UKHC)

Query:
1
LYINEMKS
8
neurofibromatosis

LYI++ MKS

Sbjct:
592
LYISKMKS
599

4H6
gi|22773353|gb|AC007998.10|,
LPQCPSRGS
10 AA
gi|1352515|sp|P48745|
Hs.235935
37-42
Altered

Homo

L (SEQ ID

NOV_HUMAN,

Score = 20.2 bits (40), Expect =
expression of

sapiens chromosome
NO: 66)

NOV protein

414
novH is

18, clone RP11-

homolog precursor

Identities = 5/6 (83%),
associated with

322E11, complete

(NovH)

Positives = 5/6 (83%)
human

sequence

(Nephroblastoma

Query:
2
PQCPSR
7
adrenocortical

overexpressed gene

PQCP R

tumorigenesis

protein homolog)

Sbjct:
37
PQCPGR
42

5A2
gi|40788180|emb|AJ583821.2|,
PGWDCRLPE
23 AA
gi|21759008|sp|Q96CA5|
Hs.256126
150-159
ML-IAP, a novel

Homo

AESCRFLLSS

BIR7_HUMAN,

Score = 22.7 bits (46), Expect =
inhibitor of

sapiens mRNA for
RGED (SEQ

Baculoviral IAP

68
apoptosis, is

ubiquitin specific
ID NO: 67)

repeat-containing protein

Identities = 7/10 (70%),
preferentially

proteinase 40 (USP40

7 (Kidney inhibitor of

Positives = 9/10 (90%)
expressed in

gene)

apoptosis protein) (KIAP)

Query:
12
SCRFLLSSRG
21
human

(Melanoma inhibitor of

SC+FLL S+G

melanomas

apoptosis protein)

Sbjct:
150
SCQFLLRSKG
159

(ML-IAP) (Livin)

5A7
gi|16508181|emb|AL138765.18|,
KKMRTKM
7 AA
gi|30580423|sp|Q8IX29|
Hs.511876
14-19
A high

Human
(SEQ ID NO:

FX16_HUMAN,

Score = 20.6 bits (41), Expect =
expression level

DNA sequence from
68)

F-box only protein

310
of F-box protein,

clone RP11-34E5 on

16

Identities = 5/6 (83%),
Skp2 is

chromosome 10,

Positives = 6/6 (100%)
observed in

complete sequence

Query:
2
KMRTKM
7
diffuse large cell

KM+TKM

B lymphoma.

Sbjct:
14
KMQTKM
19

5B9
gi|27469381|gb|BC042411.1|,
QIDSSFSIPW
17 AA
gi|126885|sp|P08235|
Hs.331409
420-426
Glucocorticoid

Mus

VVVHGRS

MCR_HUMAN,

Score = 22.3 bits (45), Expect =
and

musculus, clone
(SEQ ID NO:

Mineralocorticoid

93
mineralocorticoid

IMAGE: 4014861,
69)

receptor (MR)

Identities = 6/7 (85%),
cross-talk with

mRNA

Positives = 7/7 (100%)
progesterone

Query:
3
DSSFSIP
9
receptor to induce

DSSFS+P

focal adhesion and

Sbjct:
420
DSSFSVP
426
growth inhibition

in breast cancer

cells

5B12
gi|34996477|tpg|BK001418.1|,
GGRRSLRKP
18 AA
gi|34223735|sp|Q08462|
Hs.414591
136-142
In human Y-79

TPA: Homo
QISFFLFER

CYA2_HUMAN,

Score = 24.4 bits (50), Expect =
retinoblastoma

sapiens metastasis
(SEQ ID NO:

Adenylate cyclase,

21
cells, corticotro-

associated in lung
70)

type II (ATP

Identities = 6/7 (85%),
pin-releasing

adenocarcinoma

pyrophosphate-

Positives = 7/7 (100%)
hormone (CRH)

transcript 1 long

lyase) (Adenylyl

Query:
10
QISFFLF
16
stimulates

isoform, transcribed

cyclase)

Q+SFFLF

adenylyl cyclase

non-coding RNA,

Sbjct:
136
QVSFFLF
142
activity and

complete sequence

increases cyclic

AMP accumulation.

5D6
gi|16741726|gb|BC016660.1|,
GIRVEPPTRT
12 AA
gi|6226869|sp|P34932|
Hs.90093
316-328
Expression of

Homo

IS (SEQ ID

HS74_HUMAN,

Score = 21.0 bits (42), Expect =
HSP70 is

sapiens heat shock
NO: 71)

HEAT SHOCK 70 KDA

229
observed in

70 kDa protein 8,

PROTEIN 4

Identities = 6/9 (66%),
human

transcript variant 1,

(HEAT SHOCK 70-

Positives = 8/9 (88%)
hepatocellular

mRNA

RELATED

Query:
3
RVEPPTRTI
11
carcinoma

PROTEIN APG-

RVEPP R++

2)(HSP70RY)

Sbjct:
316
RVEPPLRSV
324

5E3
gi|40849693|gb|AY495321.1|,
RNRYSTARE
10 AA
gi|2501463|sp|Q93008|
Hs.77578
1356-1361
Oxidative

Homo

R (SEQ ID

FAFX_HUMAN,

Score = 21.4 bits (43), Expect =
Modifications and

sapiens isolate V1-16
NO: 72)

Probable ubiquitin

172
Down-regulation of

mitochondrion,

carboxyl-terminal

Identities = 6/6 (100%),
Ubiquitin

complete genome

hydrolase FAF-X

Positives = 6/6 (100%)
Carboxyl-terminal

Query:
5
STARER
10
Hydrolase L1

STARER

Associated with

Sbjct:
1356
STARER
1361
Idiopathic

Parkinson's and

Alzheimer's

Diseases.

5H8
gi|13273214|gb|AAK17820|,
GKRHIGGTD
10 AA
gi|41713338|sp|Q8N690|
Corresponding
21-25
The expression

cytochrome c oxidase
Y (SEQ ID

D119_HUMAN
Unigene
Score = 19.3 bits (38), Expect =
of human beta-

subunit I [Homo
NO: 73)

Beta-defensin 119
number
550
defensin genes

sapiens]

precursor (Beta-
is not
Identities = 5/5 (100%),
in oral squamous

defensin 19) (DEFB-
found
Positives = 5/5 (100%)
cell carcinomas

19)

Query:
1
GKRHI
5
(SCCs) was demon-

GKRHI

strated by in situ

Sbjct:
21
GKRHI
25
hybridization.

5A4
gi|17149463|gb|AC068228.8|,
VVSQLTAEM
12 AA
gi|129825|sp|P05164|
Hs.458272
23-29
Myeloperoxidase

Homo

RLE (SEQ ID

PERM_HUMAN,

Score = 22.7 bits (46), Expect =
immunoreactivity

sapiens chromosome
NO: 74)

Myeloperoxidase

71
is observed in

8, clone RP11-

precursor (MPO)

Identities = 6/7 (85%),
adult acute

539E17, complete

Positives = 7/7 (100%)
lymphoblastic

sequence

Query:
5
LTAEMRL
11
leukemia

LTAEM+L

Sbjct:
23
LTAEMKL
29

5E7
gi|4885510|ref|NM_005381.1|,
RACQRSTWK
21 AA
gi|25453064|sp|Q9UPT6|
Hs.514335
192-199
JNK interacting

Homo

TKEGNGQTE

JIP3_HUMAN,

Score = 23.1 bits (47), Expect =
protein (JIP) can

sapiens nucleolin
SSS (SEQ ID

C-jun-amino-

51
inhibit JNK

(NCL), mRNA
NO: 75)

terminal kinase

Identities = 7/8 (87%),
signaling

interacting protein 3

Positives = 7/8 (87%)
pathway in NPC

(JNK-interacting

Query:
13
GNGQTESS
20
cell

protein 3) (JIP-3)

GN QTESS

(nasopharyngeal

Sbjct:
192
GNSQTESS
199
carcinoma)

gi|40849693|gb|AY495321.1|,
RNRYSTARER
10 AA
gi|2501463|sp|Q93008|
Hs.77578
1356-1361
Oxidative

Homo sapiens

(SEQ ID NO:

FAFX_HUMAN,

Score = 21.4 bits (43), Expect =
Modifications and

isolate V1-16
72)

Probable ubiquitin

172
Down-regulation

mitochondrion,

carboxyl-terminal

Identities = 6/6 (100%),
of Ubiquitin

complete genome

hydrolase FAF-X

Positives = 6/6 (100%)
Carboxyl-terminal

Query:
5
STARER
10
Hydrolase L1

STARER

Associated with

Sbjct:
1356
STARER
1361
Idiopathic

Parkinson's and

Alzheimer's

Diseases.

gi|13273214|gb|AAK17820|,
GKRHIGGTDY
10 AA
gi|41713338|sp|Q8N690|
Corresponding
21-25
The expression of

cytochrome c oxidase
(SEQ ID NO:

D119_HUMAN Beta-
Unigene
Score = 19.3 bits (38), Expect =
human beta-

subunit I [Homo
73)

defensin 119 precursor
number
550
defensin genes in

sapiens]

(Beta-defensin 19)
is not
Identities = 5/5 (100%),
oral squamous cell

(DEFB-19)
found
Positives = 5/5 (100%)
carcinomas (SCCs)

Query:
1
GKRHI
5
was demonstrated

GKRHI

by in situ

Sbjct:
21
GKRHI
25
hybridization.

gi|17149463|gb|AC068228.8|,
VVSQLTAEMR
12 AA
gi|129825|sp|
Hs.458272
23-29
Myeloperoxidase

Homo sapiens

LE (SEQ ID

P05164|PERM_HUMAN,

Score = 22.7 bits (46), Expect =
immunoreactivity

chromosome 8, clone
NO: 74)

Myeloperoxidase

71
is observed in

RP11-539E17,

precursor (MPO)

Identities = 6/7 (85%),
adult acute

complete sequence

Positives = 7/7 (100%)
lymphoblastic

Query:
5
LTAEMRL
11
leukemia

LTAEM+L

Sbjct:
23
LTAEMKL
29

gi|4885510|ref|NM_005381.1|,
RACQRSTWKT
21 AA
gi|25453064|sp|Q9UPT6|
Hs.514335
192-199
JNK interacting

Homo sapiens

KEGNGQTESS

JIP3_HUMAN, C-jun-

Score = 23.1 bits (47), Expect =
protein (JIP) can

nucleolin (NCL), mRNA
S (SEQ ID NO:

amino-terminal kinase

51
inhibit JNK

75)

interacting protein 3

Identities = 7/8 (87%),
signaling pathway

(JNK-interacting

Positives = 7/8 (87%)
in NPC cell cell

interacting

Query:
13
GNGQTESS
20
cell

protein 3) (JIP-3)

GN QTESS

(nasopharyngeal

Sbjct:
192
GNSQTESS
199
carcinoma)

TABLE 7A

Selection of most significant clones from Group 1 dataset

26 Clones ordered according to binding with the 16 patients in Group 1. None of the 25 healthy women's sera (belonging to Group 1)

contained IgGs that any of these clones. Clones are shown in rows. Patients numbers are shown in the columns. The last column, TP,

Total number of patients whose serum IgGs bound to each phage clone.

embedded image

all others were analyzed at a serum dilution of 1:10000.

TABLE 7B

Binding of 26 clones with 16 Patients on a new dataset (Group 2)

The rows represent the 26 clones and the columns represent the 16 patients. As shown in this table, sera from 16 out of the 16 patients in

Group 2 contained IgGs that bound at least one clone. None of IgGs in sera 12 healthy women interacted with any of these 26 clones.

embedded image

all others were analyzed at a serum dilution of 1:10000.

TABLE 8

Proteins Identified as Overexpressed by IHC in OVCA Through Literature Mining

Gene Symbol
Function
Histotype Studied
Source
PMID

1.
PARP
Chromatin Modification
rsc
PP
17413981

2.
CTSB
Protease
nhs
PP
14984956

3.
CCNE
Cell cycle regulator
se, mu, en, cc
PP
11585414

4.
CLDN4
Receptor
se, metastatic se,
PP
15277215

5.
CLU
Associated with apoptosis
se, non serous
PP
15578711

6.
CYP1B1
Mixed function monooxygenase
se, mu, en, cc, MMMT
PP
11461084

7.
EIF5A2
Protein biosynthesis
nhs
PP
16424057; 15205331

8.
FSCN
Actin binding protein
se, mu, en, cc, other
PP
18498068

9.
FGF8
growth and development
se, mu, en, cc
PP
11072239

10.
HE4;
Protease inhibitor
se, mu, en, cc
PP
16607372

WFDC2

11.
IGFBP5
Prolong the half life of IGFs
se, mu, en, cc
PP
16729015

12.
MAGEA4
Tumor antigen, Development
se
PP (SX)
14695148

13.
NRG1
Signaling Protein
se, en, cc
PP
12473609

14.
PPARG
Receptor
se, mu, en, cc, mixed
PP
15583697

15.
TAG-72
Pancarcinoma antigen
se, mu, en, cc
PP
17210225

16.
TGFB1
Growth & differentiation
se, mu
PP
16835828

17.
VEGF-A
Angiogenesis
se, mu, en
PP
18343598; 16835828

18.
VEGF-C
Angiogenesis
se, mu, en
PP
18343598; 16835828

19.
MEIS1
Transcription factor
se, mu, en, cc, other
PW
17949970

20.
PAX8
Transcription factor
se, mu, en, cc
PW
18724243

21.
CKS1B
cell cycle regulator
se
Lit
16572426

22.
CLDN3
Receptor
se, mu, en, cc
Lit
15161682

23.
EDD
Binds ubiquitin
nhs
Lit (SX)
18349819

24.
FLT1
Binds to VEGF
se, mu
Lit
16835828

25.
MUC 1
Signaling
se, mu, en, cc
Lit
16061277; 15161682

26.
MUC16
Ovarian cancer antigen CA125
Mouse model
Lit
18637025

27.
PRAME
Repressor of retinoic acid
Se
Lit
18709641

receptor

28.
RalBP1
Multidrug resistance
Nhs
Lit
17954908

29.
S100A1
Interact with hsp's and CYP40
se, metastaic se, others
Lit
15277215

30.
VEGF-D
Angiogenesis
se, mu, en
Lit
18343598; 16835828

Se: Serous Mu: Mucinous En: Endometroid Cc: Clear cell

PP: Plasma Proteome: http://www.plasmaproteome.org/ppihome.htm

Lit: identified solely by literature mining; PW: Pathwork (Monzon FA et. al JCO, 2009, v27:1)

(SX): also identified as a tumor antigen in the SEREX database; http://ludwig-sun5.unil.ch/CancerImmunomeDB/

nhs: No histotype specified; rsc: Randomly selected cases

REFERENCES

1. Ali-Fehmi et al. Analysis of the expression of human tumor antigens in ovarian cancer tissues. Cancer Biomarkers 6:33-48 2010.

2. Alizadeh A A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503-511, (2000).

3. An, A, et al. A learning system for more accurate classifications. Lecture Notes in Artificial Intelligence, Vancouver. 1418:426-441, (1998).

4. Aunoble B, et al. Major oncogenes and tumor suppressor genes involved in epithelial ovarian cancer. Int J Oncol 16:567-76, (2000).

5. Baron A T, et al. Serum sErbB1 and Epidermal Growth Factor Levels As Tumor Biomarkers in Women with Stage III or IV Epithelial Ovarian Cancer Epidemiology. Biomarkers & Prevention 8:129-137, 1999.

6. Bauer R, et al. Cloning and characterization of the Drosophila homologue of the AP-2 transcription factor. Oncogene 17:1911-1922 (1998).

7. Bast R C, et al. Reactivity of a monoclonal antibody with human ovarian carcinoma. J. Clin Invest 68:1331-1337 (1981).

8. Bast R C et al. A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer. N Engl J Med 309: 883-887 (1983).

9. Berek, J S et al. Serum interleukins-6 levels correlate with disease status in patients with epithelial ovarian cancer. Am J Obstet Gynecol 164: 1038-1043 (1991).

10. Bittner, M et al. Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling, Nature 406:536-540 (2000).

11. Blake C, et al. UCI respiratory of machine learning databases (1998).

12. Boyd J, et al. Molecular genetic and clinical implications [Review]. Gynecol Oncol 64:196-206 (1997).

13. Breiman L, et al. Classification and regression trees, Wadsworth and Brooks (1984).

14. Buettner R, et al. An alternatively spliced form of AP-2 encodes a negative regulator of transcriptional activation by AP-2. Mol. Cell Biol 13:4174-4185 (1993).

15. Chiao P J, et al. Elevated expression of the human ribosomal S2 gene in human tumors. Molecular Carcinogenesis 5:219-231 (1992).

16. Clark P, et al. The CN2 induction algorithm. Machine Learning 3:261-283 (1989).

17. Coleman M P, et al. Trends in cancer incidence and mortality. Lyon, France: IARC Scientific Publications 121:477-498 (1993).

18. Deyo J, et al. A novel protein expressed at high cell density but not during growth arrest. DNA and Cell Biol 17:437-447 (1998).

19. Draghici S. The Constraint Based Decomposition, accepted for publication in Neural Networks, to appear (2001).

20. Einhorn, N. et al. Prospective evaluation of serum CA 125 levels for early detection of ovarian cancer. Obstet Gynecol 80:14-18 (1992).

21. Golub T R, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-537 (1999).

22. Gotlieb W H, et al. Presence of interleukins in the ascites of patients with ovarian and other intrabdominal cancers. Cytokine 4:385-390 (1992).

23. Greenlee R T, et al. Cancer Statistics. CA Cancer J Clin 50:7-33 (2000).

24. Heath, S. et al. Induction of oblique decision tree. In IJCAI-93. Washington, D.C. (1993).

25. Hogdall E V, et al. Predictive values of serum tumour markers tetranectin, OVX1, CASA and CA125 in patients with a pelvic mass. Int J serum tumour markers tectranectin, OVX1, CASA and CA125 in patients with a pelvic mass. Int J Cancer 89:519-523 (2000).

26. Holschneider C H, et al. Ovarian cancer: epidemiology, biology, and prognostic factors. Semin Surg Oncol 1:3-10 (2000).

27. Houts T M: Improved 2-Color Normalization For Microarray Analyses Employing Cyanine Dyes, CAMDA (2000). Critical Assessment of Techniques for Microarray Data Mining. Duke University Medical Center, Dec 18-19 (2000).

28. Jacobs I J, et al. Potential screening tests for ovarian cancer, in Sharp F, Mason W P, Leake R E (eds). Ovarian Cancer. London, Chapman and Hall Medical, 197-205 (1997).

29. Jacobs, I. Et al. Multimodal approach to screening for ovarian cancer. Lancet 1268-271 (1988).

30. Jacobs I, et al. The CA 125 tumor-associated antigen: a review of the literature. Hum Reprod 4:1-12 (1989).

31. Kacinski B M et al. Macrophage colony-stimulating factor is produced by human ovarian and endometrial adenocarcinoma-derived cell lines and is present at abnormally high levels in the plasma of ovarian carcinoma patients with active disease. Cancer Cells 7:333-337 (1989).

32. Kerr, Martin, Churchill. Analysis of variance for gene expression microarray data. Journal of Computational Biology (2000).

33. Kim, S Y et al. Coordinate Control of Growth and Cytokeratin 13 Expression by Retinoic Acid. Molecular Carcinogenesis 16:6-11 (1996).

34. Kohonen T. Learning vector quantization. Neural Networks, 1 (suppl.1):303 (1988).

35. Kohonen T. Learning vector quantization. In the handbook of brain theory and neural networks pp. 537-540. Cambridge Mass.: MIT press (1995).

36. MacBeath G. et al. Printing proteins as microarrays for high-throughput function determination. Science 289:1760-3 (2000).

37. Monzon et al. 2009 Multicenter validation of a 1,550-gene expression profile for identification of tumor tissue of origin. J Clin Oncol. 27:2503-8 (2009).

38. Murthy K. On growing better decision trees from data. Unpublished doctoral dissertation. John Hopkins University (1995).

39. Musavi M. et al. On the training of radial basis functions classifiers. Neural Networks 5:595-603 (1992).

40. Patsner B. et al. Comparison of serum CA 125 and lipid associated sialic acid (LASA-P) in monitoring patients with invasive ovarian adenocarcinoma. Gynecol Oncol 30(1): 98-103 (1988).

41. Peng Y S, et al. ARHI is the center of allelic deletion on chromosome Ip31 in ovarian and breast cancers. Int J Cancer 86:690-4 (2000).

42. Precup D, et al. Classification using $/Phi$-machines and constructive function approximation. In Proc. 15th International Conf. On Machine Learning, pages 439-444. Morgan Kaufmann, San Francisco, Calif. (1998).

43. Poggio T, et al. Networks for approximation and learning. Proceedings of IEEE 78(9):1481-149 (1990).

44. Quinlan J R: C4.5: Programs for machine learning, Morgan-Kaufmann (1993).

45. Rumelhart, D E, et al. Learning internal representations by error backpropagation. Parallel Distributed Processing: Explorations in the Microstructures of Cognition, MIT Press/Bradford Books (1986).

46. Schwartz P E, et al. Circulating tumor markers in the monitoring of gynecologic malignancies. Cancer 60:353-361 (1987).

47. Schmittgen T D et al. Quantitative reverse transcription-polymerase chain reaction to study mRNA decay: comparison of endpoint and real-time methods. Anal Biochem, 285:194-204 (2000).

48. Sonoda K, Nakashima M, Kaku T, Kamura T, Nakano H, Watanabe T. A novel tumor-associated antigen expressed in human uterine and ovarian carcinomas. Cancer 1996 77:1501-9,

49. Nakashima M, Sonoda K, Watanabe T. Inhibition of cell growth and induction of apoptotic cell death by the human tumor-associated antigen RCAS1. Nat Med. 1999 5:938-42.

50. Lindstrom M S, Klangby U, Wiman K G. p14ARF homozygous deletion or MDM2 overexpression in Burkitt lymphoma lines carrying wild type p53. Oncogene. 20(17):2171-7, 2001.

Number	Name	Date	Kind
20050239146	Tainsky et al.	Oct 2005	A1
20070083334	Mintz et al.	Apr 2007	A1

Number	Date	Country
WO 9964576	Dec 1999	WO
WO 0036107	Jun 2000	WO

	Number	Date	Country
Parent	11060867	Feb 2005	US
Child	12824305		US
Parent	10004587	Dec 2001	US
Child	11060867		US

Neoepitope detection of disease using protein arrays

Information

Patent Number

Date Filed

Date Issued

Inventors

Examiners

Agents

CPC

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

GRANT INFORMATION

US Referenced Citations (2)

Foreign Referenced Citations (2)

Non-Patent Literature Citations (8)

Related Publications (1)

Continuation in Parts (2)

Entry
Gura (Science, 1997, 278:1041-1042).
Kaiser (Science, 2006, 313: 1370).
Taber's Cyclopedic Medical Dictionary (1985, F.A. Davis Company, Philadelphia, p. 274).
Busken, C et al. (Digestive Disease Week Abstracts and Itinerary Planner, 2003, abstract No. 850).
Yang et al. (Genome Res. 2001, 11: 1888-1898).
Rast et al. (Developmental Biol. 2000: 228:270-286).
ABI Prism® 7700 Sequence Detection System User's Manual (Applied Biosystems, Jan. 2001).
Fernandez et al. (Cancer Detection and Prevention, Jan. 2005, 29: 59-65).