Gene Expression Profiling for Identification, Monitoring and Treatment of Breast Cancer

Information

  • Patent Application
  • 20100255470
  • Publication Number
    20100255470
  • Date Filed
    November 06, 2007
    17 years ago
  • Date Published
    October 07, 2010
    14 years ago
Abstract
A method is provided in various embodiments for determining a profile data set for a subject with breast cancer or conditions related to breast cancer based on a sample from the subject, wherein the sample provides a source of RNAs. The method includes using amplification for measuring the amount of RNA corresponding to at least 1 constituent from Tables 1-5. The profile data set comprises the measure of each constituent, and amplification is performed under measurement conditions that are substantially repeatable.
Description
FIELD OF THE INVENTION

The present invention relates generally to the identification of biological markers associated with the identification of breast cancer. More specifically, the present invention relates to the use of gene expression data in the identification, monitoring and treatment of breast cancer and in the characterization and evaluation of conditions induced by or related to breast cancer.


BACKGROUND OF THE INVENTION

Breast cancer is cancer that forms in tissues of the breast, usually the ducts and lobules (glands that make milk). It occurs in both men and women, although male breast cancer is rare. Worldwide, it is the most common form of cancer in females, and is the second most fatal cancer in women, affecting, at some time in their lives, approximately one out of thirty-nine to one out of three women who reach age ninety in the Western world.


There are many different types of breast cancer, including ductal carcinoma, lobular carcinoma, inflammatory breast cancer, medullary carcinoma, colloid carcinoma, papillary carcinoma, and metaplastic carcinoma. Ductal carcinoma is a very common type of breast cancer in women. Ductal carcinoma refers to the development of cancer cells within the milk ducts of the breast. It comes in two forms: infiltrating ductal carcinoma (IDC), an invasive cell type; and ductal carcinoma in situ (DCIS), a noninvasive cancer. DCIS is the most common type of noninvasive breast cancer in women. IDC, formed in the ducts of breast in the earliest stage, is the most common, most heterogeneous invasive breast cancer cell type. It accounts for 80% of all types of breast cancer.


Early breast cancer can in some cases be painful. A lump under the arm or above the collarbone that does not go away may be present. Other possible symptoms include breast discharge, nipple inversion and changes in the skin overlying the breast. Breast cancer is often discovered before any symptoms are even present. Due to the high incidence of breast cancer among older women, screening is highly recommended and often routine in physical examinations of women, with mammograms for women over the age of 50. Current screening methods include breast self-examination, mammography ultrasound, and MRI.


Mammography is the modality of choice for screening of early breast cancer, and breast cancers detected by mammography are usually smaller than those detected clinically. While mammography has been shown to reduce breast cancer-related mortality by 20-30%, the test is not very accurate. Only a small fraction (5-10%) of abnormalities on mammograms turn out to be breast cancer. However, each suspicious mammogram requires a follow-up medical visit which typically includes a second mammogram, and other follow-up test procedures including sonograms, needle biopsies, or surgical biopsies. Most women who undergo these procedures find out that no breast cancer is present. Additionally, the number of unnecessary medical procedures involved in following up on a false positive mammography results creates an unnecessary economic burden.


Additionally, mammograms can give false negative results. A false negative result occurs when cancer is present and not diagnosed. Breast density and the experience, skill, and training of the doctor reading a mammogram are contributing factors which can lead to false negative results. Unless a patient were to receive a second opinion, a false negative mammography eventually results in advanced stage breast cancer which may be untreatable and/or fatal by the time it is detected. Thus, there is a need for tests which can aid in the diagnosis of breast cancer.


Furthermore, there is currently no test capable of reliably identifying patients who are likely to respond to specific therapies, especially for cancer that has spread beyond the breast tissue. Information on any condition of a particular patient and a patient's response to types and dosages of therapeutic or nutritional agents has become an important issue in clinical medicine today not only from the aspect of efficiency of medical practice for the health care industry but for improved outcomes and benefits for the patients. Thus, there is also the need for tests which can aid in monitoring the progression and treatment of breast cancer.


SUMMARY OF THE INVENTION

The invention is in based in part upon the identification of gene expression profiles (Precision Profiles™) associated with breast cancer. These genes are referred to herein as breast cancer associated genes or breast cancer associated constituents. More specifically, the invention is based upon the surprising discovery that detection of as few as one breast cancer associated gene in a subject derived sample is capable of identifying individuals with or without breast cancer with at least 75% accuracy. More particularly, the invention is based upon the surprising discovery that the methods provided by the invention are capable of detecting breast cancer by assaying blood samples.


In various aspects the invention provides methods of evaluating the presence or absence (e.g., diagnosing or prognosing) of breast cancer, based on a sample from the subject, the sample providing a source of RNAs, and determining a quantitative measure of the amount of at least one constituent of any constituent (e.g., breast cancer associated gene) of any of Tables 1, 2, 3, 4, and 5 and arriving at a measure of each constituent.


Also provided are methods of assessing or monitoring the response to therapy in a subject having breast cancer, based on a sample from the subject, the sample providing a source of RNAs, determining a quantitative measure of the amount of at least one constituent of any constituent of Tables 1, 2, 3, 4, 5 or 6 and arriving at a measure of each constituent. The therapy, for example, is immunotherapy. Preferably, one or more of the constituents listed in Table 6 is measured. For example, the response of a subject to immunotherapy is monitored by measuring the expression of TNFRSF10A, TMPRSS2, SPARC, ALOX5, PTPRC, PDGFA, PDGFB, BCL2, BAD, BAK1, BAG2, KIT, MUC1, ADAM17, CD19, CD4, CD40LG, CD86, CCR5, CTLA4, HSPA1A, IFNG, IL23A, PTGS2, TLR2, TGFB1, TNF, TNFRSF13B, TNFRSF10B, VEGF, MYC, AURKA, BAX, CDH1, CASP2, CD22, IGF1R, ITGA5, ITGAV, ITGB1, ITGB3, IL6R, JAK1, JAK2, JAK3, MAP3K1, PDGFRA, COX2, PSCA, THBS1, THBS2, TYMS, TLR1, TLR3, TLR6, TLR7, TLR9, TNFSF10, TNFSF13B, TNFRSF17, TP53, ABL1, ABL2, AKT1, KRAS, BRAF, RAF1, ERBB4, ERBB2, ERBB3, AKT2, EGFR, IL12 or IL15. The subject has received an immunotherapeutic drug such as anti CD19 Mab, rituximab, epratuzumab, lumiliximab, visilizumab (Nuvion), HuMax-CD38, zanolimumab, anti CD40 Mab, anti-CD40L, Mab, galiximab anti-CTLA-4 MAb, ipilimumab, ticilimumab, anti-SDF-1 MAb, panitumumab, nimotuzumab, pertuzumab, trastuzumab, catumaxomab, ertumaxomab, MDX-070, anti ICOS, anti IFNAR, AMG-479, anti-IGF-1R Ab, R1507, IMC-A12, antiangiogenesis MAb, CNTO-95, natalizumab (Tysabri), SM3, IPB-01, hPAM-4, PAM4, Imuteran, huBrE-3 tiuxetan, BrevaRex MAb, PDGFR MAb, IMC-3G3, GC-1008, CNTO-148 (Golimumab), CS-1008, belimumab, anti-BAFF MAb, or bevacizumab. Alternatively, the subject has received a placebo.


In a further aspect the invention provides methods of monitoring the progression of breast cancer in a subject, based on a sample from the subject, the sample providing a source of RNAs, by determining a quantitative measure of the amount of at least one constituent of any constituent of Tables 1, 2, 3, 4, and 5 as a distinct RNA constituent in a sample obtained at a first period of time to produce a first subject data set and determining a quantitative measure of the amount of at least one constituent of any constituent of Tables 1, 2, 3, 4, and 5 as a distinct RNA constituent in a sample obtained at a second period of time to produce a second subject data set. Optionally, the constituents measured in the first sample are the same constituents measured in the second sample. The first subject data set and the second subject data set are compared allowing the progression of breast cancer in a subject to be determined. The second subject is taken e.g., one day, one week, one month, two months, three months, 1 year, 2 years, or more after the first subject sample. Optionally the first subject sample is taken prior to the subject receiving treatment, e.g. chemotherapy, radiation therapy, or surgery and the second subject sample is taken after treatment.


In various aspects the invention provides a method for determining a profile data set, i.e., a breast cancer profile, for characterizing a subject with breast cancer or conditions related to breast cancer based on a sample from the subject, the sample providing a source of RNAs, by using amplification for measuring the amount of RNA in a panel of constituents including at least 1 constituent from any of Tables 1-5, and arriving at a measure of each constituent. The profile data set contains the measure of each constituent of the panel.


The methods of the invention further include comparing the quantitative measure of the constituent in the subject derived sample to a reference value or a baseline value, e.g. baseline data set. The reference value is for example an index value. Comparison of the subject measurements to a reference value allows for the present or absence of breast cancer to be determined, response to therapy to be monitored or the progression of breast cancer to be determined. For example, a similarity in the subject data set compares to a baseline data set derived form a subject having breast cancer indicates that presence of breast cancer or response to therapy that is not efficacious. Whereas a similarity in the subject data set compares to a baseline data set derived from a subject not having breast cancer indicates the absence of breast cancer or response to therapy that is efficacious. In various embodiments, the baseline data set is derived from one or more other samples from the same subject, taken when the subject is in a biological condition different from that in which the subject was at the time the first sample was taken, with respect to at least one of age, nutritional history, medical condition, clinical indicator, medication, physical activity, body mass, and environmental exposure, and the baseline profile data set may be derived from one or more other samples from one or more different subjects.


The baseline data set or reference values may be derived from one or more other samples from the same subject taken under circumstances different from those of the first sample, and the circumstances may be selected from the group consisting of (i) the time at which the first sample is taken (e.g., before, after, or during treatment cancer treatment), (ii) the site from which the first sample is taken, (iii) the biological condition of the subject when the first sample is taken.


The measure of the constituent is increased or decreased in the subject compared to the expression of the constituent in the reference, e.g., normal reference sample or baseline value. The measure is increased or decreased 10%, 25%, 50% compared to the reference level. Alternately, the measure is increased or decreased 1, 2, 5 or more fold compared to the reference level.


In various aspects of the invention the methods are carried out wherein the measurement conditions are substantially repeatable, particularly within a degree of repeatability of better than ten percent, five percent or more particularly within a degree of repeatability of better than three percent, and/or wherein efficiencies of amplification for all constituents are substantially similar, more particularly wherein the efficiency of amplification is within ten percent, more particularly wherein the efficiency of amplification for all constituents is within five percent, and still more particularly wherein the efficiency of amplification for all constituents is within three percent or less.


In addition, the one or more different subjects may have in common with the subject at least one of age group, gender, ethnicity, geographic location, nutritional history, medical condition, clinical indicator, medication, physical activity, body mass, and environmental exposure. A clinical indicator may be used to assess breast cancer or a condition related to breast cancer of the one or more different subjects, and may also include interpreting the calibrated profile data set in the context of at least one other clinical indicator, wherein the at least one other clinical indicator includes blood chemistry, X-ray or other radiological or metabolic imaging technique, molecular markers in the blood, other chemical assays, and physical findings.


At least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30 40, 50 or more constituents are measured. Preferably, EGR1, IL18BP or SOCS1 is measured. In one aspect, two constituents from Table 1 are measured. The first constituent is ABCB1, ATM, BAX, BCL2, BRCA1, BRCA2, CASP8, CCND1, CDH1, CDK4, CDKN1B, CRABP2, CTNNB1, CTSD, EGR1, HPGD, ITGA6, MTA1, TGFB1, or TP53 and the second constituent is any other constituent from Table 1.


In another aspect two constituents from Table 2 are measured. The first constituent is ADAM17, C1QA, CCR3, CCR5, CD19, CD86, CXCL1, DPP4, EGR1, HSPA1A, IL10, IL18BP, IL1R1, ILS, IRF1, or TLR2 and the second constituent is any other constituent from Table 2.


In a further aspect two constituents from Table 3 are measured. The first constituent is ABL1, ABL2, AKT1, ATM, BAD, BAX, BCL2, BRAF, CASP8, CCNE1, CDK2, CDK5, CDKN1A, CDKN2A, EGR1, ERBB2, FOS, GZMA, NOTCH2, NRAS, PLAUR, SKIL, SMAD4, or TGFB1, and the second constituent is any other constituent from Table 3.


In yet another aspect two constituents from Table 4 are measured. The first constituent is CDKN2D, CREBBP, EGR1, EP300, MAPK1, NR4A2, S100A6, or TGFB1 and the second constituent is TGFB1 or TOPBP1.


In a further aspect two constituents from Table 5 are measured. The first constituent is ACPP, ADAM17, ANLN, APC, AXIN2, BAX, BCAM, C1QA, C1QB, CASP3, CASP9, CCL3, CCL5, CD97, CDH1, CEACAM1, CNKSR2, CTNNA1, DLC1, EGR1, ELA2, ESR1, G6PD, GNB1, GSK3B, HMOX1, HSPA1A, IKBKE, ING2, IRF1, MAPK14, MME, MNDA, MSH6, NCOA1, NUDT4, PLEK2, PTEN, SERPINA1, SP1, SRF, TEGT, TGFB1, TLR2, or TNF and the second constituent is any other constituent from Table 5.


Optionally, three constituents are measured from Table 1. The first constituent is ABCB1, ATBF1, ATM, BAX, BCL2, BRCA1, BRCA2, C3, CASP8, CASP9, CCND1, CCNE1, CDK4, CDKN1A, CDKN1B, CRABP2, CTNNB1, CTSB, CTSD, DLC1, EGR1, EIF4E, ERBB2, FOS, GADD45A, GNB2L1, HPGD, ICAM1, IF1TM3, ILF2, ING1, ITGA6, ITGB3, MCM7, MDM2, MGMT, MTA1, MUC1, MYC, MYCBP, NFKB1, PI3, PTGS2, RB1, RP51077B9.4, RPS3, TGFB1, or TNF, and the second constituent is BAX, C3, CASP9, CCND1, CDK4, CDKN1B, CRABP2, CTSB, CTSD, DLC1, EGR1, EIF4E, ERBB2, FOS, GADD45A, GNB2L1, GNB2L1, HPGD, ICAM1, IFITM3, IGF2, IL8, ILF2, ING1, ITGA6, LAMB2, MCM7, MDM2, MGMT, MMP9, MTA1, MUC1, MYBL2, MYC, MYCBP, NCOA1, NFKB1, NME1, PCNA, PI3, PITRM1, PSMB5, PSMD1, PTGS2, RB1, RP51077B9.4, RPL13A, RPS3, SLPI, TGFB1, TGFBR1, THBS1, TIMP1, TNF, TP53, USP10, or VEZF1. The third constituent is any other constituent selected from Table 1,


The constituents are selected so as to distinguish from a normal reference subject and a breast cancer-diagnosed subject. The breast cancer-diagnosed subject is diagnosed with different stages of cancer, estrogen-positive breast cancer, or estrogen-negative breast cancer. Alternatively, the panel of constituents is selected as to permit characterizing the severity of breast cancer in relation to a normal subject over time so as to track movement toward normal as a result of successful therapy and away from normal in response to cancer recurrence. Thus in some embodiments, the methods of the invention are used to determine efficacy of treatment of a particular subject.


Preferably, the constituents are selected so as to distinguish, e.g., classify between a normal and a breast cancer-diagnosed subject with at least 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or greater accuracy. By “accuracy” is meant that the method has the ability to distinguish, e.g., classify, between subjects having breast cancer or conditions associated with breast cancer, and those that do not. Accuracy is determined for example by comparing the results of the Gene Precision Profiling™ to standard accepted clinical methods of diagnosing breast cancer, e.g., mammography, sonograms, and biopsy procedures. For example the combination of constituents are selected according to any of the models enumerated in Tables 1A, 2A, 3A, 4A, or 5A.


In some embodiments, the methods of the present invention are used in conjunction with standard accepted clinical methods to diagnose breast cancer, e.g. mammography, sonograms, and biopsy procedures.


By breast cancer or conditions related to breast cancer is meant a cancer of the breast tissue which can occur in both women and men. Types of breast cancer include ductal carcinoma infiltrating ductal carcinoma (IDC), and ductal carcinoma in situ (DCIS), lobular carcinoma, inflammatory breast cancer, medullary carcinoma, colloid carcinoma, papillary carcinoma, metaplastic carcinoma, Stage 1-Stage 4 breast cancer, estrogen-positive breast cancer, and estrogen-negative breast cancer.


The sample is any sample derived from a subject which contains RNA. For example, the sample is blood, a blood fraction, body fluid, a population of cells or tissue from the subject, a breast cell, or a rare circulating tumor cell or circulating endothelial cell found in the blood.


Optionally one or more other samples can be taken over an interval of time that is at least one month between the first sample and the one or more other samples, or taken over an interval of time that is at least twelve months between the first sample and the one or more samples, or they may be taken pre-therapy intervention or post-therapy intervention. In such embodiments, the first sample may be derived from blood and the baseline profile data set may be derived from tissue or body fluid of the subject other than blood. Alternatively, the first sample is derived from tissue or bodily fluid of the subject and the baseline profile data set is derived from blood.


Also included in the invention are kits for the detection of breast cancer in a subject, containing at least one reagent for the detection or quantification of any constituent measured according to the methods of the invention and instructions for using the kit.


Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.


Other features and advantages of the invention will be apparent from the following detailed description and claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a graphical representation of a 2-gene model for cancer based on disease-specific genes, capable of distinguishing between subjects afflicted with cancer and normal subjects with a discrimination line overlaid onto the graph as an example of the Index Function evaluated at a particular logit value. Values above and to the left of the line represent subjects predicted to be in the normal population. Values below and to the right of the line represent subjects predicted to be in the cancer population. ALOX5 values are plotted along the Y-axis, S100A6 values are plotted along the X-axis.



FIG. 2 is a graphical representation of a 3-gene model, CTSD, EGR1, and NCOA1, based on the Precision Profile™ for Breast Cancer (Table 1), capable of distinguishing between subjects afflicted with breast cancer and normal subjects, with a discrimination line overlaid onto the graph as an example of the Index Function evaluated at a particular logit value. Values above and to the left of the line represent subjects predicted to be in the normal population. Values below and to the right of the line represent subjects predicted to be in the breast cancer population. CTSD and EGR1 values are plotted along the Y-axis. NCOA1 values are plotted along the X-axis.



FIG. 3 is a graphical representation of the Z-statistic values for each gene shown in Table 1B. A negative Z statistic means up-regulation of gene expression in breast cancer vs. normal patients; a positive Z statistic means down-regulation of gene expression in breast cancer vs. normal patients.



FIG. 4 is a graphical representation of a breast cancer index based on the 3-gene logistic regression model, CTSD, EGR1, and NCOA1, capable of distinguishing between normal, healthy subjects and subjects suffering from breast cancer.



FIG. 5 is a graphical representation of a 2-gene model, CCR5 and EGR1, based on the Precision Profile™ for Inflammatory Response (Table 2), capable of distinguishing between subjects afflicted with breast cancer and normal subjects, with a discrimination line overlaid onto the graph as an example of the Index Function evaluated at a particular logit value. Values to the right of the line represent subjects predicted to be in the normal population. Values to the left of the line represent subjects predicted to be in the breast cancer population. CCR5 values are plotted along the Y-axis, EGR1 values are plotted along the X-axis.



FIG. 6 is a graphical representation of a 2-gene model, EGR1 and NME1, based on the Human Cancer General Precision Profile™ (Table 3), capable of distinguishing between subjects afflicted with breast cancer and normal subjects, with a discrimination line overlaid onto the graph as an example of the Index Function evaluated at a particular logit value. Values above the line represent subjects predicted to be in the normal population. Values below the line represent subjects predicted to be in the breast cancer population. EGR1 values are plotted along the Y-axis, NME1 values are plotted along the X-axis.



FIG. 7 is a graphical representation of a 2-gene model, EGR1 and PLEK2, based on the Cross-Cancer Precision Profile™ (Table 5), capable of distinguishing between subjects afflicted with breast cancer and normal subjects, with a discrimination line overlaid onto the graph as an example of the Index Function evaluated at a particular logit value. Values above the line represent subjects predicted to be in the normal population. Values below the line represent subjects predicted to be in the breast cancer population. EGR1 values are plotted along the Y-axis, PLEK2 values are plotted along the X-axis.





DETAILED DESCRIPTION
DEFINITIONS

The following terms shall have the meanings indicated unless the context otherwise requires:


“Accuracy” refers to the degree of conformity of a measured or calculated quantity (a test reported value) to its actual (or true) value. Clinical accuracy relates to the proportion of true outcomes (true positives (TP) or true negatives (TN)) versus misclassified outcomes (false positives (FP) or false negatives (FN)), and may be stated as a sensitivity, specificity, positive predictive values (PPV) or negative predictive values (NPV), or as a likelihood, odds ratio, among other measures.


“Algorithm” is a set of rules for describing a biological condition. The rule set may be defined exclusively algebraically but may also include alternative or multiple decision points requiring domain-specific knowledge, expert interpretation or other clinical indicators.


An “agent” is a “composition” or a “stimulus”, as those terms are defined herein, or a combination of a composition and a stimulus.


“Amplification” in the context of a quantitative RT-PCR assay is a function of the number of DNA replications that are required to provide a quantitative determination of its concentration. “Amplification” here refers to a degree of sensitivity and specificity of a quantitative assay technique. Accordingly, amplification provides a measurement of concentrations of constituents that is evaluated under conditions wherein the efficiency of amplification and therefore the degree of sensitivity and reproducibility for measuring all constituents is substantially similar.


A “baseline profile data set” is a set of values associated with constituents of a Gene Expression Panel (Precision Profile™) resulting from evaluation of a biological sample (or population or set of samples) under a desired biological condition that is used for mathematically normative purposes. The desired biological condition may be, for example, the condition of a subject (or population or set of subjects) before exposure to an agent or in the presence of an untreated disease or in the absence of a disease. Alternatively, or in addition, the desired biological condition may be health of a subject or a population or set of subjects. Alternatively, or in addition, the desired biological condition may be that associated with a population or set of subjects selected on the basis of at least one of age group, gender, ethnicity, geographic location, nutritional history, medical condition, clinical indicator, medication, physical activity, body mass, and environmental exposure.


“Breast Cancer” is a cancer of the breast tissue which can occur in both women and men. Types of breast cancer include ductal carcinoma (infiltrating ductal carcinoma (IDC), and ductal carcinoma in situ (DCIS), lobular carcinoma, inflammatory breast cancer, medullary carcinoma, colloid carcinoma, papillary carcinoma, and metaplastic carcinoma. As defined herein the term “breast cancer” also includes stage 1, stage 2, stage 3, and stage 4 breast cancer, estrogen-positive breast cancer, estrogen-negative breast cancer, Her2+ breast cancer, and Her2− breast cancer.


A “biological condition” of a subject is the condition of the subject in a pertinent realm that is under observation, and such realm may include any aspect of the subject capable of being monitored for change in condition, such as health; disease including cancer; trauma; aging; infection; tissue degeneration; developmental steps; physical fitness; obesity, and mood. As can be seen, a condition in this context may be chronic or acute or simply transient. Moreover, a targeted biological condition may be manifest throughout the organism or population of cells or may be restricted to a specific organ (such as skin, heart, eye or blood), but in either case, the condition may be monitored directly by a sample of the affected population of cells or indirectly by a sample derived elsewhere from the subject. The term “biological condition” includes a “physiological condition”.


“Body fluid” of a subject includes blood, urine, spinal fluid, lymph, mucosal secretions, prostatic fluid, semen, haemolymph or any other body fluid known in the art for a subject.


“Calibrated profile data set” is a function of a member of a first profile data set and a corresponding member of a baseline profile data set for a given constituent in a panel.


A “circulating endothelial cell” (“CEC”) is an endothelial cell from the inner wall of blood vessels which sheds into the bloodstream under certain circumstances, including inflammation, and contributes to the formation of new vasculature associated with cancer pathogenesis. CECs may be useful as a marker of tumor progression and/or response to antiangiogenic therapy.


A “circulating tumor cell” (“CTC”) is a tumor cell of epithelial origin which is shed from the primary tumor upon metastasis, and enters the circulation. The number of circulating tumor cells in peripheral blood is associated with prognosis in patients with metastatic cancer. These cells can be separated and quantified using immunologic methods that detect epithelial cells.


A “clinical indicator” is any physiological datum used alone or in conjunction with other data in evaluating the physiological condition of a collection of cells or of an organism. This term includes pre-clinical indicators.


“Clinical parameters” encompasses all non-sample or non-Precision Profiles™ of a subject's health status or other characteristics, such as, without limitation, age (AGE), ethnicity (RACE), gender (SEX), and family history of cancer.


A “composition” includes a chemical compound, a nutraceutical, a pharmaceutical, a homeopathic formulation, an allopathic formulation, a naturopathic formulation, a combination of compounds, a toxin, a food, a food supplement, a mineral, and a complex mixture of substances, in any physical state or in a combination of physical states.


To “derive” a profile data set from a sample includes determining a set of values associated with constituents of a Gene Expression Panel (Precision Profile™) either (i) by direct measurement of such constituents in a biological sample.


“Distinct RNA or protein constituent” in a panel of constituents is a distinct expressed product of a gene, whether RNA or protein. An “expression” product of a gene includes the gene product whether RNA or protein resulting from translation of the messenger RNA.


“FN” is false negative, which for a disease state test means classifying a disease subject incorrectly as non-disease or normal.


“FP” is false positive, which for a disease state test means classifying a normal subject incorrectly as having disease.


A “formula,” “algorithm,” or “model” is any mathematical equation, algorithmic, analytical or programmed process, statistical technique, or comparison, that takes one or more continuous or categorical inputs (herein called “parameters”) and calculates an output value, sometimes referred to as an “index” or “index value.” Non-limiting examples of “formulas” include comparisons to reference values or profiles, sums, ratios, and regression operators, such as coefficients or exponents, value transformations and normalizations (including, without limitation, those normalization schemes based on clinical parameters, such as gender, age, or ethnicity), rules and guidelines, statistical classification models, and neural networks trained on historical populations. Of particular use in combining constituents of a Gene Expression Panel (Precision Profile™) are linear and non-linear equations and statistical significance and classification analyses to determine the relationship between levels of constituents of a Gene Expression Panel (Precision Profile™) detected in a subject sample and the subject's risk of breast cancer. In panel and combination construction, of particular interest are structural and synactic statistical classification algorithms, and methods of risk index construction, utilizing pattern recognition features, including, without limitation, such established techniques such as cross-correlation, Principal Components Analysis (PCA), factor rotation, Logistic Regression Analysis (LogReg), Kolmogorov Smirnoff tests (KS), Linear Discriminant Analysis (LDA), Eigengene Linear Discriminant Analysis (ELDA), Support Vector Machines (SVM), Random Forest (RF), Recursive Partitioning Tree (RPART), as well as other related decision tree classification techniques (CART, LART, LARTree, FlexTree, amongst others), Shrunken Centroids (SC), StepAIC, K-means, Kth-Nearest Neighbor, Boosting, Decision Trees, Neural Networks, Bayesian Networks, Support Vector Machines, and Hidden Markov Models, among others. Other techniques may be used in survival and time to event hazard analysis, including Cox, Weibull, Kaplan-Meier and Greenwood models well known to those of skill in the art. Many of these techniques are useful either combined with a consituentes of a Gene Expression Panel (Precision Profile™) selection technique, such as forward selection, backwards selection, or stepwise selection, complete enumeration of all potential panels of a given size, genetic algorithms, voting and committee methods, or they may themselves include biomarker selection methodologies in their own technique. These may be coupled with information criteria, such as Akaike's Information Criterion (AIC) or Bayes Information Criterion (BIC), in order to quantify the tradeoff between additional biomarkers and model improvement, and to aid in minimizing overfit. The resulting predictive models may be validated in other clinical studies, or cross-validated within the study they were originally trained in, using such techniques as Bootstrap, Leave-One-Out (LOO) and 10-Fold cross-validation (10-Fold CV). At various steps, false discovery rates (FDR) may be estimated by value permutation according to techniques known in the art.


A “Gene Expression Panel” (Precision Profile™) is an experimentally verified set of constituents, each constituent being a distinct expressed product of a gene, whether RNA or protein, wherein constituents of the set are selected so that their measurement provides a measurement of a targeted biological condition.


A “Gene Expression Profile” is a set of values associated with constituents of a Gene Expression Panel (Precision Profile™) resulting from evaluation of a biological sample (or population or set of samples).


A “Gene Expression Profile Inflammation Index” is the value of an index function that provides a mapping from an instance of a Gene Expression Profile into a single-valued measure of inflammatory condition.


A Gene Expression Profile Cancer Index” is the value of an index function that provides a mapping from an instance of a Gene Expression Profile into a single-valued measure of a cancerous condition.


The “health” of a subject includes mental, emotional, physical, spiritual, allopathic, naturopathic and homeopathic condition of the subject.


“Index” is an arithmetically or mathematically derived numerical characteristic developed for aid in simplifying or disclosing or informing the analysis of more complex quantitative information. A disease or population index may be determined by the application of a specific algorithm to a plurality of subjects or samples with a common biological condition.


“Inflammation” is used herein in the general medical sense of the word and may be an acute or chronic; simple or suppurative; localized or disseminated; cellular and tissue response initiated or sustained by any number of chemical, physical or biological agents or combination of agents.


“Inflammatory state” is used to indicate the relative biological condition of a subject resulting from inflammation, or characterizing the degree of inflammation.


A “large number” of data sets based on a common panel of genes is a number of data sets sufficiently large to permit a statistically significant conclusion to be drawn with respect to an instance of a data set based on the same panel.


“Negative predictive value” or “NPV” is calculated by TN/(TN+FN) or the true negative fraction of all negative test results. It also is inherently impacted by the prevalence of the disease and pre-test probability of the population intended to be tested.


See, e.g., O'Marcaigh A S, Jacobson R M, “Estimating the Predictive Value of a Diagnostic Test, How to Prevent Misleading or Confusing Results,” Clin. Ped. 1993, 32(8): 485-491, which discusses specificity, sensitivity, and positive and negative predictive values of a test, e.g., a clinical diagnostic test. Often, for binary disease state classification approaches using a continuous diagnostic test measurement, the sensitivity and specificity is summarized by Receiver Operating Characteristics (ROC) curves according to Pepe et al., “Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening Marker,” Am. J. Epidemiol 2004, 159 (9): 882-890, and summarized by the Area Under the Curve (AUC) or c-statistic, an indicator that allows representation of the sensitivity and specificity of a test, assay, or method over the entire range of test (or assay) cut points with just a single value. See also, e.g., Shultz, “Clinical Interpretation Of Laboratory Procedures,” chapter 14 in Teitz, Fundamentals of Clinical Chemistry, Burtis and Ashwood (eds.), 4th edition 1996, W.B. Saunders Company, pages 192-199; and Zweig et al., “ROC Curve Analysis: An Example Showing the Relationships Among Serum Lipid and Apolipoprotein Concentrations in Identifying Subjects with Coronory Artery Disease,” Clin. Chem., 1992, 38(8): 1425-1428. An alternative approach using likelihood functions, BIC, odds ratios, information theory, predictive values, calibration (including goodness-of-fit), and reclassification measurements is summarized according to Cook, “Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction,” Circulation 2007, 115: 928-935.


A “normal” subject is a subject who is generally in good health, has not been diagnosed with breast cancer, is asymptomatic for breast cancer, and lacks the traditional laboratory risk factors for breast cancer.


A “normative” condition of a subject to whom a composition is to be administered means the condition of a subject before administration, even if the subject happens to be suffering from a disease.


A “panel” of genes is a set of genes including at least two constituents.


A “population of cells” refers to any group of cells wherein there is an underlying commonality or relationship between the members in the population of cells, including a group of cells taken from an organism or from a culture of cells or from a biopsy, for example.


“Positive predictive value” or “PPV” is calculated by TP/(TP+FP) or the true positive fraction of all positive test results. It is inherently impacted by the prevalence of the disease and pre-test probability of the population intended to be tested.


“Risk” in the context of the present invention, relates to the probability that an event will occur over a specific time period, and can mean a subject's “absolute” risk or “relative” risk. Absolute risk can be measured with reference to either actual observation post-measurement for the relevant time cohort, or with reference to index values developed from statistically valid historical cohorts that have been followed for the relevant time period. Relative risk refers to the ratio of absolute risks of a subject compared either to the absolute risks of lower risk cohorts, across population divisions (such as tertiles, quartiles, quintiles, or deciles, etc.) or an average population risk, which can vary by how clinical risk factors are assessed. Odds ratios, the proportion of positive events to negative events for a given test result, are also commonly used (odds are according to the formula p/(1−p) where p is the probability of event and (1−p) is the probability of no event) to no-conversion.


“Risk evaluation,” or “evaluation of risk” in the context of the present invention encompasses making a prediction of the probability, odds, or likelihood that an event or disease state may occur, and/or the rate of occurrence of the event or conversion from one disease state to another, i.e., from a normal condition to cancer or from cancer remission to cancer, or from primary cancer occurrence to occurrence of a cancer metastasis. Risk evaluation can also comprise prediction of future clinical parameters, traditional laboratory risk factor values, or other indices of cancer results, either in absolute or relative terms in reference to a previously measured population. Such differing use may require different consituentes of a Gene Expression Panel (Precision Profile™) combinations and individualized panels, mathematical algorithms, and/or cut-off points, but be subject to the same aforementioned measurements of accuracy and performance for the respective intended use.


A “sample” from a subject may include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, taken from the subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision or intervention or other means known in the art. The sample is blood, urine, spinal fluid, lymph, mucosal secretions, prostatic fluid, semen, haemolymph or any other body fluid known in the art for a subject. The sample is also a tissue sample. The sample is or contains a circulating endothelial cell or a circulating tumor cell.


“Sensitivity” is calculated by TP/(TP+FN) or the true positive fraction of disease subjects.


“Specificity” is calculated by TN/(TN+FP) or the true negative fraction of non-disease or normal subjects.


By “statistically significant”, it is meant that the alteration is greater than what might be expected to happen by chance alone (which could be a “false positive”). Statistical significance can be determined by any method known in the art. Commonly used measures of significance include the p-value, which presents the probability of obtaining a result at least as extreme as a given data point, assuming the data point was the result of chance alone. A result is often considered highly significant at a p-value of 0.05 or less and statistically significant at a p-value of 0.10 or less. Such p-values depend significantly on the power of the study performed.


A “set” or “population” of samples or subjects refers to a defined or selected group of samples or subjects wherein there is an underlying commonality or relationship between the members included in the set or population of samples or subjects.


A “Signature Profile” is an experimentally verified subset of a Gene Expression Profile selected to discriminate a biological condition, agent or physiological mechanism of action.


A “Signature Panel” is a subset of a Gene Expression Panel (Precision Profile™), the constituents of which are selected to permit discrimination of a biological condition, agent or physiological mechanism of action.


A “subject” is a cell, tissue, or organism, human or non-human, whether in vivo, ex vivo or in vitro, under observation. As used herein, reference to evaluating the biological condition of a subject based on a sample from the subject, includes using blood or other tissue sample from a human subject to evaluate the human subject's condition; it also includes, for example, using a blood sample itself as the subject to evaluate, for example, the effect of therapy or an agent upon the sample.


A “stimulus” includes (i) a monitored physical interaction with a subject, for example ultraviolet A or B, or light therapy for seasonal affective disorder, or treatment of psoriasis with psoralen or treatment of cancer with embedded radioactive seeds, other radiation exposure, hormone therapy, chemotherapy, surgery (e.g., lumpectomy, mastectomy) and (ii) any monitored physical, mental, emotional, or spiritual activity or inactivity of a subject.


“Therapy” includes all interventions whether biological, chemical, physical, metaphysical, or combination of the foregoing, intended to sustain or alter the monitored biological condition of a subject.


“TN” is true negative, which for a disease state test means classifying a non-disease or normal subject correctly.


“TP” is true positive, which for a disease state test means correctly classifying a disease subject.


The PCT patent application publication number WO 01/25473, published Apr. 12, 2001, entitled “Systems and Methods for Characterizing a Biological Condition or Agent Using Calibrated Gene Expression Profiles,” filed for an invention by inventors herein, and which is herein incorporated by reference, discloses the use of Gene Expression Panels (Precision Profiles™) for the evaluation of (i) biological condition (including with respect to health and disease) and (ii) the effect of one or more agents on biological condition (including with respect to health, toxicity, therapeutic treatment and drug interaction).


In particular, the Gene Expression Panels (Precision Profiles™) described herein may be used, without limitation, for measurement of the following: therapeutic efficacy of natural or synthetic compositions or stimuli that may be formulated individually or in combinations or mixtures for a range of targeted biological conditions; prediction of toxicological effects and dose effectiveness of a composition or mixture of compositions for an individual or for a population or set of individuals or for a population of cells; determination of how two or more different agents administered in a single treatment might interact so as to detect any of synergistic, additive, negative, neutral or toxic activity; performing pre-clinical and clinical trials by providing new criteria for pre-selecting subjects according to informative profile data sets for revealing disease status; and conducting preliminary dosage studies for these patients prior to conducting phase 1 or 2 trials. These Gene Expression Panels (Precision Profiles™) may be employed with respect to samples derived from subjects in order to evaluate their biological condition.


The present invention provides Gene Expression Panels (Precision Profiles™) for the evaluation or characterization of breast cancer and conditions related to breast cancer in a subject. In addition, the Gene Expression Panels described herein also provide for the evaluation of the effect of one or more agents for the treatment of breast cancer and conditions related to breast cancer.


The Gene Expression Panels (Precision Profiles™) are referred to herein as the Precision Profile™ for Breast Cancer, the Precision Profile™ for Inflammatory Response, the Human Cancer General Precision Profile™, the Precision Profile™ for EGR1, and the Cross-Cancer Precision Profile™. The Precision Profile™ for Breast Cancer includes one or more genes, e.g., constituents, listed in Table 1, whose expression is associated with breast cancer or conditions related to breast cancer. The Precision Profile™ for Inflammatory Response includes one or more genes, e.g., constituents, listed in Table 2, whose expression is associated with inflammatory response and cancer. The Human Cancer General Precision Profile™ includes one or more genes, e.g., constituents, listed in Table 3, whose expression is associated generally with human cancer (including without limitation prostate, breast, ovarian, cervical, lung, colon, and skin cancer).


The Precision Profile™ for EGR1 includes one or more genes, e.g., constituents listed in Table 4, whose expression is associated with the role early growth response (EGR) gene family plays in human cancer. The Precision Profile™ for EGR1 is composed of members of the early growth response (EGR) family of zinc finger transcriptional regulators; EGR1, 2, 3 & 4 and their binding proteins; NAB1 & NAB2 which function to repress transcription induced by some members of the EGR family of transactivators. In addition to the early growth response genes, The Precision Profile™ for EGR1 includes genes involved in the regulation of immediate early gene expression, genes that are themselves regulated by members of the immediate early gene family (and EGR1 in particular) and genes whose products interact with EGR1, serving as co-activators of transcriptional regulation.


The Cross-Cancer Precision Profile™ includes one or more genes, e.g., constituents listed in Table 5, whose expression has been shown, by latent class modeling, to play a significant role across various types of cancer, including without limitation, prostate, breast, ovarian, cervical, lung, colon, and skin cancer. Each gene of the Precision Profile™ for Breast Cancer, the Precision Profile™ for Inflammatory Response, the Human Cancer General Precision Profile™, the Precision Profile™ for EGR1, and the Cross-Cancer Precision Profile™ is referred to herein as a breast cancer associated gene or a breast cancer associated constituent. In addition to the genes listed in the Precision Profiles™ herein, cancer associated genes or cancer associated constituents include oncogenes, tumor suppression genes, tumor progression genes, angiogenesis genes, and lymphogenesis genes.


The present invention also provides a method for monitoring and determining the efficacy of immunotherapy, using the Gene Expression Panels (Precision Profiles™) described herein. Immunotherapy target genes include, without limitation, TNFRSF10A, TMPRSS2, SPARC, ALOX5, PTPRC, PDGFA, PDGFB, BCL2, BAD, BAK1, BAG2, KIT, MUC1, ADAM17, CD19, CD4, CD40LG, CD86, CCR5, CTLA4, HSPA1A, IFNG, IL23A, PTGS2, TLR2, TGFB1, TNF, TNFRSF13B, TNFRSF10B, VEGF, MYC, AURKA , BAX, CDH1, CASP2, CD22, IGF1R, ITGA5, ITGAV, ITGB1, ITGB3, IL6R, JAK1, JAK2, JAK3, MAP3K1, PDGFRA, COX2, PSCA, THBS1, THBS2, TYMS, TLR1, TLR3, TLR6, TLR7, TLR9, TNFSF10, TNFSF13B, TNFRSF17, TP53, ABL1, ABL2, AKT1, KRAS, BRAF, RAF1, ERBB4, ERBB2, ERBB3, AKT2, EGFR, IL12 and IL15. For example, the present invention provides a method for monitoring and determining the efficacy of immunotherapy by monitoring the immunotherapy associated genes, i.e., constituents, listed in Table 6.


It has been discovered that valuable and unexpected results may be achieved when the quantitative measurement of constituents is performed under repeatable conditions (within a degree of repeatability of measurement of better than twenty percent, preferably ten percent or better, more preferably five percent or better, and more preferably three percent or better). For the purposes of this description and the following claims, a degree of repeatability of measurement of better than twenty percent may be used as providing measurement conditions that are “substantially repeatable”. In particular, it is desirable that each time a measurement is obtained corresponding to the level of expression of a constituent in a particular sample, substantially the same measurement should result for substantially the same level of expression. In this manner, expression levels for a constituent in a Gene Expression Panel (Precision Profile™) may be meaningfully compared from sample to sample. Even if the expression level measurements for a particular constituent are inaccurate (for example, say, 30% too low), the criterion of repeatability means that all measurements for this constituent, if skewed, will nevertheless be skewed systematically, and therefore measurements of expression level of the constituent may be compared meaningfully. In this fashion valuable information may be obtained and compared concerning expression of the constituent under varied circumstances.


In addition to the criterion of repeatability, it is desirable that a second criterion also be satisfied, namely that quantitative measurement of constituents is performed under conditions wherein efficiencies of amplification for all constituents are substantially similar as defined herein. When both of these criteria are satisfied, then measurement of the expression level of one constituent may be meaningfully compared with measurement of the expression level of another constituent in a given sample and from sample to sample.


The evaluation or characterization of breast cancer is defined to be diagnosing breast cancer, assessing the presence or absence of breast cancer, assessing the risk of developing breast cancer or assessing the prognosis of a subject with breast cancer, assessing the recurrence of breast cancer or assessing the presence or absence of a metastasis. Similarly, the evaluation or characterization of an agent for treatment of breast cancer includes identifying agents suitable for the treatment of breast cancer. The agents can be compounds known to treat breast cancer or compounds that have not been shown to treat breast cancer.


The agent to be evaluated or characterized for the treatment of breast cancer may be an alkylating agent (e.g., Cisplatin, Carboplatin, Oxaliplatin, BBR3464, Chlorambucil, Chlormethine, Cyclophosphamides, Ifosmade, Melphalan, Carmustine, Fotemustine, Lomustine, Streptozocin, Busulfan, Dacarbazine, Mechlorethamine, Procarbazine, Temozolomide, ThioTPA, and Uramustine); an anti-metabolite (e.g., purine (azathioprine, mercaptopurine), pyrimidine (Capecitabine, Cytarabine, Fluorouracil, Gemcitabine), and folic acid (Methotrexate, Pemetrexed, Raltitrexed)); a vinca alkaloid (e.g., Vincristine, Vinblastine, Vinorelbine, Vindesine); a taxane (e.g., paclitaxel, docetaxel, BMS-247550); an anthracycline (e.g., Daunorubicin, Doxorubicin, Epirubicin, Idarubicin, Mitoxantrone, Valrubicin, Bleomycin, Hydroxyurea, and Mitomycin); a topoisomerase inhibitor (e.g., Topotecan, Irinotecan Etoposide, and Teniposide); a monoclonal antibody (e.g., Alemtuzumab, Bevacizumab, Cetuximab, Gemtuzumab, Panitumumab, Rituximab, and Trastuzumab); a photosensitizer (e.g., Aminolevulinic acid, Methyl aminolevulinate, Porfimer sodium, and Verteporfin); a tyrosine kinase inhibitor (e.g., Gleevec™); an epidermal growth factor receptor inhibitor (e.g., Iressa™, erlotinib (Tarceva™), gefitinib); an FPTase inhibitor (e.g., FTIs (R115777, SCH66336, L-778,123)); a KDR inhibitor (e.g., SU6668, PTK787); a proteosome inhibitor (e.g., PS341); a TS/DNA synthesis inhibitor (e.g., ZD9331, Raltirexed (ZD1694, Tomudex), ZD9331, 5-FU)); an S-adenosyl-methionine decarboxylase inhibitor (e.g., SAM468A); a DNA methylating agent (e.g., TMZ); a DNA binding agent (e.g., PZA); an agent which binds and inactivates O6-alkylguanine AGT (e.g., BG); a c-raf-1 antisense oligo-deoxynucleotide (e.g., ISIS-5132 (CGP-69846A)); tumor immunotherapy (see Table 6); a steroidal and/or non-steroidal anti-inflammatory agent (e.g., corticosteroids, COX-2 inhibitors); or other agents such as Alitretinoin, Altretamine, Amsacrine, Anagrelide, Arsenic trioxide, Asparaginase, Bexarotene, Bortezomib, Celecoxib, Dasatinib, Denileukin Diftitox, Estramustine, Hydroxycarbamide, Imatinib, Pentostatin, Masoprocol, Mitotane, Pegaspargase, and Tretinoin.


Breast cancer and conditions related to breast cancer is evaluated by determining the level of expression (e.g., a quantitative measure) of an effective number (e.g., one or more) of constituents of a Gene Expression Panel (Precision Profile™) disclosed herein (i.e., Tables 1-5). By an effective number is meant the number of constituents that need to be measured in order to discriminate between a normal subject and a subject having breast cancer. Preferably the constituents are selected as to discriminate between a normal subject and a subject having breast cancer with at least 75% accuracy, more preferably 80%, 85%, 90%, 95%, 97%, 98%, 99% or greater accuracy.


The level of expression is determined by any means known in the art, such as for example quantitative PCR. The measurement is obtained under conditions that are substantially repeatable. Optionally, the qualitative measure of the constituent is compared to a reference or baseline level or value (e.g. a baseline profile set). In one embodiment, the reference or baseline level is a level of expression of one or more constituents in one or more subjects known not to be suffering from breast cancer (e.g., normal, healthy individual(s)). Alternatively, the reference or baseline level is derived from the level of expression of one or more constituents in one or more subjects known to be suffering from breast cancer. Optionally, the baseline level is derived from the same subject from which the first measure is derived. For example, the baseline is taken from a subject prior to receiving treatment or surgery for breast cancer, or at different time periods during a course of treatment. Such methods allow for the evaluation of a particular treatment for a selected individual. Comparison can be performed on test (e.g., patient) and reference samples (e.g., baseline) measured concurrently or at temporally distinct times. An example of the latter is the use of compiled expression information, e.g., a gene expression database, which assembles information about expression levels of cancer associated genes.


A reference or baseline level or value as used herein can be used interchangeably and is meant to be relative to a number or value derived from population studies, including without limitation, such subjects having similar age range, subjects in the same or similar ethnic group, sex, or, in female subjects, pre-menopausal or post-menopausal subjects, or relative to the starting sample of a subject undergoing treatment for breast cancer. Such reference values can be derived from statistical analyses and/or risk prediction data of populations obtained from mathematical algorithms and computed indices of breast cancer. Reference indices can also be constructed and used using algorithms and other methods of statistical and structural classification.


In one embodiment of the present invention, the reference or baseline value is the amount of expression of a cancer associated gene in a control sample derived from one or more subjects who are both asymptomatic and lack traditional laboratory risk factors for breast cancer.


In another embodiment of the present invention, the reference or baseline value is the level of cancer associated genes in a control sample derived from one or more subjects who are not at risk or at low risk for developing breast cancer.


In a further embodiment, such subjects are monitored and/or periodically retested for a diagnostically relevant period of time (“longitudinal studies”) following such test to verify continued absence from breast cancer (disease or event free survival). Such period of time may be one year, two years, two to five years, five years, five to ten years, ten years, or ten or more years from the initial testing date for determination of the reference or baseline value. Furthermore, retrospective measurement of cancer associated genes in properly banked historical subject samples may be used in establishing these reference or baseline values, thus shortening the study time required, presuming the subjects have been appropriately followed during the intervening period through the intended horizon of the product claim.


A reference or baseline value can also comprise the amounts of cancer associated genes derived from subjects who show an improvement in cancer status as a result of treatments and/or therapies for the cancer being treated and/or evaluated.


In another embodiment, the reference or baseline value is an index value or a baseline value. An index value or baseline value is a composite sample of an effective amount of cancer associated genes from one or more subjects who do not have cancer.


For example, where the reference or baseline level is comprised of the amounts of cancer associated genes derived from one or more subjects who have not been diagnosed with breast cancer, or are not known to be suffereing from breast cancer, a change (e.g., increase or decrease) in the expression level of a cancer associated gene in the patient-derived sample as compared to the expression level of such gene in the reference or baseline level indicates that the subject is suffering from or is at risk of developing breast cancer. In contrast, when the methods are applied prophylacticly, a similar level of expression in the patient-derived sample of a breast cancer associated gene compared to such gene in the baseline level indicates that the subject is not suffering from or is at risk of developing breast cancer.


Where the reference or baseline level is comprised of the amounts of cancer associated genes derived from one or more subjects who have been diagnosed with breast cancer, or are known to be suffereing from breast cancer, a similarity in the expression pattern in the patient-derived sample of a breast cancer gene compared to the breast cancer baseline level indicates that the subject is suffering from or is at risk of developing breast cancer.


Expression of a breast cancer gene also allows for the course of treatment of breast cancer to be monitored. In this method, a biological sample is provided from a subject undergoing treatment, e.g., if desired, biological samples are obtained from the subject at various time points before, during, or after treatment. Expression of a breast cancer gene is then determined and compared to a reference or baseline profile. The baseline profile may be taken or derived from one or more individuals who have been exposed to the treatment. Alternatively, the baseline level may be taken or derived from one or more individuals who have not been exposed to the treatment. For example, samples may be collected from subjects who have received initial treatment for breast cancer and subsequent treatment for breast cancer to monitor the progress of the treatment.


Differences in the genetic makeup of individuals can result in differences in their relative abilities to metabolize various drugs. Accordingly, the Precision Profile™ for Breast Cancer (Table 1), the Precision Profile™ for Inflammatory Response (Table 2), the Human Cancer General Precision Profile™ (Table 3), the Precision Profile™ for EGR1 (Table 4), and the Cross-Cancer Precision Profile™ (Table 5),disclosed herein, allow for a putative therapeutic or prophylactic to be tested from a selected subject in order to determine if the agent is suitable for treating or preventing breast cancer in the subject. Additionally, other genes known to be associated with toxicity may be used. By suitable for treatment is meant determining whether the agent will be efficacious, not efficacious, or toxic for a particular individual. By toxic it is meant that the manifestations of one or more adverse effects of a drug when administered therapeutically. For example, a drug is toxic when it disrupts one or more normal physiological pathways.


To identify a therapeutic that is appropriate for a specific subject, a test sample from the subject is exposed to a candidate therapeutic agent, and the expression of one or more of breast cancer genes is determined. A subject sample is incubated in the presence of a candidate agent and the pattern of breast cancer gene expression in the test sample is measured and compared to a baseline profile, e.g., a breast cancer baseline profile or a non-breast cancer baseline profile or an index value. The test agent can be any compound or composition. For example, the test agent is a compound known to be useful in the treatment of breast cancer. Alternatively, the test agent is a compound that has not previously been used to treat breast cancer.


If the reference sample, e.g., baseline is from a subject that does not have breast cancer a similarity in the pattern of expression of breast cancer genes in the test sample compared to the reference sample indicates that the treatment is efficacious. Whereas a change in the pattern of expression of breast cancer genes in the test sample compared to the reference sample indicates a less favorable clinical outcome or prognosis. By “efficacious” is meant that the treatment leads to a decrease of a sign or symptom of breast cancer in the subject or a change in the pattern of expression of a breast cancer gene such that the gene expression pattern has an increase in similarity to that of a reference or baseline pattern. Assessment of breast cancer is made using standard clinical protocols. Efficacy is determined in association with any known method for diagnosing or treating breast cancer.


A Gene Expression Panel (Precision Profile™) is selected in a manner so that quantitative measurement of RNA or protein constituents in the Panel constitutes a measurement of a biological condition of a subject. In one kind of arrangement, a calibrated profile data set is employed. Each member of the calibrated profile data set is a function of (i) a measure of a distinct constituent of a Gene Expression Panel (Precision Profile™) and (ii) a baseline quantity.


Additional embodiments relate to the use of an index or algorithm resulting from quantitative measurement of constituents, and optionally in addition, derived from either expert analysis or computational biology (a) in the analysis of complex data sets; (b) to control or normalize the influence of uninformative or otherwise minor variances in gene expression values between samples or subjects; (c) to simplify the characterization of a complex data set for comparison to other complex data sets, databases or indices or algorithms derived from complex data sets; (d) to monitor a biological condition of a subject; (e) for measurement of therapeutic efficacy of natural or synthetic compositions or stimuli that may be formulated individually or in combinations or mixtures for a range of targeted biological conditions; (f) for predictions of toxicological effects and dose effectiveness of a composition or mixture of compositions for an individual or for a population or set of individuals or for a population of cells; (g) for determination of how two or more different agents administered in a single treatment might interact so as to detect any of synergistic, additive, negative, neutral of toxic activity (h) for performing pre-clinical and clinical trials by providing new criteria for pre-selecting subjects according to informative profile data sets for revealing disease status and conducting preliminary dosage studies for these patients prior to conducting Phase 1 or 2 trials.


Gene expression profiling and the use of index characterization for a particular condition or agent or both may be used to reduce the cost of Phase 3 clinical trials and may be used beyond Phase 3 trials; labeling for approved drugs; selection of suitable medication in a class of medications for a particular patient that is directed to their unique physiology; diagnosing or determining a prognosis of a medical condition or an infection which may precede onset of symptoms or alternatively diagnosing adverse side effects associated with administration of a therapeutic agent; managing the health care of a patient; and quality control for different batches of an agent or a mixture of agents.


The Subject

The methods disclosed herein may be applied to cells of humans, mammals or other organisms without the need for undue experimentation by one of ordinary skill in the art because all cells transcribe RNA and it is known in the art how to extract RNA from all types of cells.


A subject can include those who have not been previously diagnosed as having breast cancer or a condition related to breast cancer. Alternatively, a subject can also include those who have already been diagnosed as having breast cancer or a condition related to breast cancer. Diagnosis of breast cancer is made, for example, from any one or combination of the following procedures: a medical history, physical examination, breast examination, mammography, chest x-ray, bone scan, CT, MRI, PET scanning, blood tests (e.g., CA-15.3 levels (carbohydrate antigen 15.3, and epithelial mucin)) and biopsy (including fine-needle aspiration, nipples aspirates, ductal lavage, core needle biopsy, and local surgical biopsy).


Optionally, the subject has been previously treated with a surgical procedure for removing breast cancer or a condition related to breast cancer, including but not limited to any one or combination of the following treatments: a lumpectomy, mastectomy, and removal of the lymph nodes in the axilla. Optionally, the subject has previously been treated with chemotherapy (including but not limited to tamoxifen and aromatase inhibitors) and/or radiation therapy (e.g., gamma ray and brachytherapy), alone, in combination with, or in succession to a surgical procedure, as previously described. Optionally, the subject may be treated with any of the agents previously described; alone, or in combination with a surgical procedure for removing breast cancer, as previously described.


A subject can also include those who are suffering from, or at risk of developing breast cancer or a condition related to breast cancer, such as those who exhibit known risk factors for breast cancer or conditions related to breast cancer. Known risk factors for breast cancer include, but are not limited to: gender (higher susceptibility women than in men), age (increased risk with age, especially age 50 and over), inherited genetic predisposition (mutations in the BRCA1 and BRCA2 genes), alcohol consumption, and exposure to environmental factors (e.g., chemicals used in pesticides, cosmetics, and cleaning products).


Selecting Constituents of a Gene Expression Panel (Precision Profile™)

The general approach to selecting constituents of a Gene Expression Panel (Precision Profile™) has been described in PCT application publication number WO 01/25473, incorporated herein in its entirety. A wide range of Gene Expression Panels (Precision Profiles™) have been designed and experimentally validated, each panel providing a quantitative measure of biological condition that is derived from a sample of blood or other tissue. For each panel, experiments have verified that a Gene Expression Profile using the panel's constituents is informative of a biological condition. (It has also been demonstrated that in being informative of biological condition, the Gene Expression Profile is used, among other things, to measure the effectiveness of therapy, as well as to provide a target for therapeutic intervention).


In addition to the the Precision Profile™ for Breast Cancer (Table 1), the Precision Profile™ for Inflammatory Response (Table 2), the Human Cancer General Precision Profile™ (Table 3), the Precision Profile™ for EGR1 (Table 4), and the Cross-Cancer Precision Profile™ (Table 5), include relevant genes which may be selected for a given Precision Profiles™, such as the Precision Profiles™ demonstrated herein to be useful in the evaluation of breast cancer and conditions related to breast cancer.


Inflammation and Cancer

Evidence has shown that cancer in adults arises frequently in the setting of chronic inflammation. Epidemiological and experimental studies provide stong support for the concept that inflammation facilitates malignant growth. Inflammatory components have been shown to 1) induce DNA damage, which contributes to genetic instability (e.g., cell mutation) and transformed cell proliferation (Balkwill and Mantovani, Lancet 357:539-545 (2001)); 2) promote angiogenesis, thereby enhancing tumor growth and invasiveness (Coussens L. M. and Z. Werb, Nature 429:860-867 (2002)); and 3) impair myelopoiesis and hemopoiesis, which cause immune dysfunction and inhibit immune surveillance (Kusmartsev and Gabrilovic, Cancer Immunol. Immunother. 51:293-298 (2002); Serafini et al., Cancer Immunol. Immunther. 53:64-72 (2004)).


Studies suggest that inflammation promotes malignancy via proinflammatory cytokines, including but not limited to IL-1β, which enhance immune suppression through the induction of myeloid suppressor cells, and that these cells down regulate immune surveillance and allow the outgrowth and proliferation of malignant cells by inhibiting the activation and/or function of tumor-specific lymphocytes. (Bunt et al., J. Immunol. 176: 284-290 (2006). Such studies are consistent with findings that myeloid suppressor cells are found in many cancer patients, including lung and breast cancer, and that chronic inflammation in some of these malignancies may enhance malignant growth (Coussens L. M. and Z. Werb, 2002).


Additionally, many cancers express an extensive repertoire of chemokines and chemokine receptors, and may be characterized by dis-regulated production of chemokines and abnormal chemokine receptor signaling and expression. Tumor-associated chemokines are thought to play several roles in the biology of primary and metastatic cancer such as: control of leukocyte infiltration into the tumor, manipulation of the tumor immune response, regulation of angiogenesis, autocrine or paracrine growth and survival factors, and control of the movement of the cancer cells. Thus, these activities likely contribute to growth within/outside the tumor microenvironment and to stimulate anti-tumor host responses.


As tumors progress, it is common to observe immune deficits not only within cells in the tumor microenvironment but also frequently in the systemic circulation. Whole blood contains representative populations of all the mature cells of the immune system as well as secretory proteins associated with cellular communications. The earliest observable changes of cellular immune activity are altered levels of gene expression within the various immune cell types. Immune responses are now understood to be a rich, highly complex tapestry of cell-cell signaling events driven by associated pathways and cascades—all involving modified activities of gene transcription. This highly interrelated system of cell response is immediately activated upon any immune challenge, including the events surrounding host response to breast cancer and treatment. Modified gene expression precedes the release of cytokines and other immunologically important signaling elements.


As such, inflammation genes, such as the genes listed in the Precision Profile™ for Inflammatory Response (Table 2) are useful for distinguishing between subjects suffering from breast cancer and normal subjects, in addition to the other gene panels, i.e., Precision Profiles™ described herein.


Early Growth Response Gene Family and Cancer

The early growth response (EGR) genes are rapidly induced following mitogenic stimulation in diverse cell types, including fibroblasts, epithelial cells and B lymphocytes. The EGR genes are members of the broader “Immediate Early Gene” (IEG) family, whose genes are activated in the first round of response to extracellular signals such as growth factors and neurotransmitters, prior to new protein synthesis. The IEG's are well known as early regulators of cell growth and differentiation signals, in addition to playing a role in other cellular processes. Some other well characterized members of the IEG family include the c-myc, c-fos and c-jun oncogenes. Many of the immediate early gene products function as transcription factors and DNA-binding proteins, though other IEG's also include secreted proteins, cytoskeletal proteins and receptor subunits. EGR1 expression is induced by a wide variety of stimuli. It is rapidly induced by mitogens such as platelet derived growth factor (PDGF), fibroblast growth factor (FGF), and epidermal growth factor (EGF), as well as by modified lipoproteins, shear/mechanical stresses, and free radicals. Interestingly, expression of the EGR1 gene is also regulated by the oncogenes v-raf, v-fps and v-src as demonstrated in transfection analysis of cells using promoter-reporter constructs. This regulation is mediated by the serum response elements (SREs) present within the EGR1 promoter region. It has also been demonstrated that hypoxia, which occurs during development of cancers, induces EGR1 expression. EGR1 subsequently enhances the expression of endogenous EGFR, which plays an important role in cell growth (over-expression of EGFR can lead to transformation). Finally, EGR1 has also been shown to be induced by Smad3, a signaling component of the TGFB pathway.


In its role as a transcriptional regulator, the EGR1 protein binds specifically to the G+C rich EGR consensus sequence present within the promoter region of genes activated by EGR1. EGR1 also interacts with additional proteins (CREBBP/EP300) which co-regulate transcription of EGR1 activated genes. Many of the genes activated by EGR1 also stimulate the expression of EGR1, creating a positive feedback loop. Genes regulated by EGR1 include the mitogens: platelet derived growth factor (PDGFA), fibroblast growth factor (FGF), and epidermal growth factor (EGF) in addition to TNF, IL2, PLAU, ICAM1, TP53, ALOX5, PTEN, FN1 and TGFB1.


As such, early growth response genes, or genes associated therewith, such as the genes listed in the Precision Profile™ for EGR1 (Table 4) are useful for distinguishing between subjects suffering from breast cancer and normal subjects, in addition to the other gene panels, i.e., Precision Profiles™, described herein.


In general, panels may be constructed and experimentally validated by one of ordinary skill in the art in accordance with the principles articulated in the present application.


Gene Expression Profiles Based on Gene Expression Panels of the Present Invention


Tables 1A-1C were derived from a study of the gene expression patterns described in Example 3 below. Table 1A describes all 1, 2 and 3-gene logistic regression models based on genes from the Precision Profile™ for Breast Cancer (Table 1) which are capable of distinguishing between subjects suffering from breast cancer and normal subjects with at least 75% accuracy. For example, the first row of Table 1A, describes a 3-gene model, CTSD, EGR1 and NCOA1, capable of correctly classifying breast cancer-afflicted subjects with 89.8% accuracy, and normal subjects with 92% accuracy.


Tables 2A-2C were derived from a study of the gene expression patterns described in Example 4 below. Table 2A describes all 1 and 2-gene logistic regression models based on genes from the Precision Profile™ for Inflammatory Response (Table 2), which are capable of distinguishing between subjects suffering from breast cancer and normal subjects with at least 75% accuracy. For example, the first row of Table 2A, describes a 2-gene model, CCR5 and EGR1, capable of correctly classifying breast cancer-afflicted subjects with 81.6% accuracy, and normal subjects with 80.8% accuracy.


Tables 3A-3C were derived from a study of the gene expression patterns described in Example 5 below. Table 3A describes all 1 and 2-gene logistic regression models based on genes from the Human Cancer General Precision Profile™ (Table 3), which are capable of distinguishing between subjects suffering from breast cancer and normal subjects with at least 75% accuracy. For example, the first row of Table 3A, describes a 2-gene model, EGR1 and NME1, capable of correctly classifying breast cancer-afflicted subjects with 89.8% accuracy, and normal subjects with 90.9% accuracy.


Tables 4A-4B were derived from a study of the gene expression patterns described in Example 6 below. Table 4A describes all 2-gene logistic regression models based on genes from the Precision Profile™ for EGR1 (Table 4), which are capable of distinguishing between subjects suffering from breast cancer and normal subjects with at least 75% accuracy. For example, the first row of Table 4A, describes a 2-gene model, NR4A1 and TGFB1, capable of correctly classifying breast cancer-afflicted subjects with 85.4% accuracy, and normal subjects with 81.8% accuracy.


Tables 5A-5C were derived from a study of the gene expression patterns described in Example 7 below. Table 5A describes all 1 and 2-gene logistic regression models based on genes from the Cross-Cancer Precision Profile™ (Table 5), which are capable of distinguishing between subjects suffering from breast cancer and normal subjects with at least 75% accuracy. For example, the first row of Table 5A, describes a 2-gene model, EGR1 and PLEK2, capable of correctly classifying breast cancer-afflicted subjects with 95.8% accuracy, and normal subjects with 100% accuracy.


Design of Assays

Typically, a sample is run through a panel in replicates of three for each target gene (assay); that is, a sample is divided into aliquots and for each aliquot the concentrations of each constituent in a Gene Expression Panel (Precision Profile™) is measured. From over thousands of constituent assays, with each assay conducted in triplicate, an average coefficient of variation was found (standard deviation/average)*100, of less than 2 percent among the normalized ΔCt measurements for each assay (where normalized quantitation of the target mRNA is determined by the difference in threshold cycles between the internal control (e.g., an endogenous marker such as 18S rRNA, or an exogenous marker) and the gene of interest. This is a measure called “intra-assay variability”. Assays have also been conducted on different occasions using the same sample material. This is a measure of “inter-assay variability”. Preferably, the average coefficient of variation of intra-assay variability or inter-assay variability is less than 20%, more preferably less than 10%, more preferably less than 5%, more preferably less than 4%, more preferably less than 3%, more preferably less than 2%, and even more preferably less than 1%.


It has been determined that it is valuable to use the quadruplicate or triplicate test results to identify and eliminate data points that are statistical “outliers”; such data points are those that differ by a percentage greater, for example, than 3% of the average of all three or four values. Moreover, if more than one data point in a set of three or four is excluded by this procedure, then all data for the relevant constituent is discarded.


Measurement of Gene Expression for a Constituent in the Panel

For measuring the amount of a particular RNA in a sample, methods known to one of ordinary skill in the art were used to extract and quantify transcribed RNA from a sample with respect to a constituent of a Gene Expression Panel (Precision Profile™). (See detailed protocols below. Also see PCT application publication number WO 98/24935 herein incorporated by reference for RNA analysis protocols). Briefly, RNA is extracted from a sample such as any tissue, body fluid, cell (e.g., circulating tumor cell) or culture medium in which a population of cells of a subject might be growing. For example, cells may be lysed and RNA eluted in a suitable solution in which to conduct a DNAse reaction. Subsequent to RNA extraction, first strand synthesis may be performed using a reverse transcriptase. Gene amplification, more specifically quantitative PCR assays, can then be conducted and the gene of interest calibrated against an internal marker such as 18S rRNA (Hirayama et al., Blood 92, 1998: 46-52). Any other endogenous marker can be used, such as 28S-25S rRNA and 5S rRNA. Samples are measured in multiple replicates, for example, 3 replicates. In an embodiment of the invention, quantitative PCR is performed using amplification, reporting agents and instruments such as those supplied commercially by Applied Biosystems (Foster City, Calif.). Given a defined efficiency of amplification of target transcripts, the point (e.g., cycle number) that signal from amplified target template is detectable may be directly related to the amount of specific message transcript in the measured sample. Similarly, other quantifiable signals such as fluorescence, enzyme activity, disintegrations per minute, absorbance, etc., when correlated to a known concentration of target templates (e.g., a reference standard curve) or normalized to a standard with limited variability can be used to quantify the number of target templates in an unknown sample.


Although not limited to amplification methods, quantitative gene expression techniques may utilize amplification of the target transcript. Alternatively or in combination with amplification of the target transcript, quantitation of the reporter signal for an internal marker generated by the exponential increase of amplified product may also be used. Amplification of the target template may be accomplished by isothermic gene amplification strategies or by gene amplification by thermal cycling such as PCR.


It is desirable to obtain a definable and reproducible correlation between the amplified target or reporter signal, i.e., internal marker, and the concentration of starting templates. It has been discovered that this objective can be achieved by careful attention to, for example, consistent primer-template ratios and a strict adherence to a narrow permissible level of experimental amplification efficiencies (for example 80.0 to 100% +/−5% relative efficiency, typically 90.0 to 100% +/−5% relative efficiency, more typically 95.0 to 100% +/−2%, and most typically 98 to 100% +/−1% relative efficiency). In determining gene expression levels with regard to a single Gene Expression Profile, it is necessary that all constituents of the panels, including endogenous controls, maintain similar amplification efficiencies, as defined herein, to permit accurate and precise relative measurements for each constituent. Amplification efficiencies are regarded as being “substantially similar”, for the purposes of this description and the following claims, if they differ by no more than approximately 10%, preferably by less than approximately 5%, more preferably by less than approximately 3%, and more preferably by less than approximately 1%. Measurement conditions are regarded as being “substantially repeatable”, for the purposes of this description and the following claims, if they differ by no more than approximately +/−10% coefficient of variation (CV), preferably by less than approximately +/−5% CV, more preferably +/−2% CV. These constraints should be observed over the entire range of concentration levels to be measured associated with the relevant biological condition. While it is thus necessary for various embodiments herein to satisfy criteria that measurements are achieved under measurement conditions that are substantially repeatable and wherein specificity and efficiencies of amplification for all constituents are substantially similar, nevertheless, it is within the scope of the present invention as claimed herein to achieve such measurement conditions by adjusting assay results that do not satisfy these criteria directly, in such a manner as to compensate for errors, so that the criteria are satisfied after suitable adjustment of assay results.


In practice, tests are run to assure that these conditions are satisfied. For example, the design of all primer-probe sets are done in house, experimentation is performed to determine which set gives the best performance. Even though primer-probe design can be enhanced using computer techniques known in the art, and notwithstanding common practice, it has been found that experimental validation is still useful. Moreover, in the course of experimental validation, the selected primer-probe combination is associated with a set of features:


The reverse primer should be complementary to the coding DNA strand. In one embodiment, the primer should be located across an intron-exon junction, with not more than four bases of the three-prime end of the reverse primer complementary to the proximal exon. (If more than four bases are complementary, then it would tend to competitively amplify genomic DNA.)


In an embodiment of the invention, the primer probe set should amplify cDNA of less than 110 bases in length and should not amplify, or generate fluorescent signal from, genomic DNA or transcripts or cDNA from related but biologically irrelevant loci.


A suitable target of the selected primer probe is first strand cDNA, which in one embodiment may be prepared from whole blood as follows:


(a) Use of Whole Blood for Ex Vivo Assessment of a Biological Condition


Human blood is obtained by venipuncture and prepared for assay. The aliquots of heparinized, whole blood are mixed with additional test therapeutic compounds and held at 37° C. in an atmosphere of 5% CO2 for 30 minutes. Cells are lysed and nucleic acids, e.g., RNA, are extracted by various standard means.


Nucleic acids, RNA and or DNA, are purified from cells, tissues or fluids of the test population of cells. RNA is preferentially obtained from the nucleic acid mix using a variety of standard procedures (or RNA Isolation Strategies, pp. 55-104, in RNA Methodologies, A laboratory guide for isolation and characterization, 2nd edition, 1998, Robert E. Farrell, Jr., Ed., Academic Press), in the present using a filter-based RNA isolation system from Ambion (RNAqueous™, Phenol-free Total RNA Isolation Kit, Catalog #1912, version 9908; Austin, Tex.). (b) Amplification Strategies.


Specific RNAs are amplified using message specific primers or random primers. The specific primers are synthesized from data obtained from public databases (e.g., Unigene, National Center for Biotechnology Information, National Library of Medicine, Bethesda, Md.), including information from genomic and cDNA libraries obtained from humans and other animals. Primers are chosen to preferentially amplify from specific RNAs obtained from the test or indicator samples (see, for example, RT PCR, Chapter 15 in RNA Methodologies, A laboratory guide for isolation and characterization, 2nd edition, 1998, Robert E. Farrell, Jr., Ed., Academic Press; or Chapter 22 pp. 143-151, RNA isolation and characterization protocols, Methods in molecular biology, Volume 86, 1998, R. Rapley and D. L. Manning Eds., Human Press, or Chapter 14 in Statistical refinement of primer design parameters; or Chapter 5, pp. 55-72, PCR applications: protocols for functional genomics, M. A. Innis, D. H. Gelfand and J. J. Sninsky, Eds., 1999, Academic Press). Amplifications are carried out in either isothermic conditions or using a thermal cycler (for example, a ABI 9600 or 9700 or 7900 obtained from Applied Biosystems, Foster City, Calif.; see Nucleic acid detection methods, pp. 1-24, in Molecular methods for virus detection, D. L. Wiedbrauk and D. H., Farkas, Eds., 1995, Academic Press). Amplified nucleic acids are detected using fluorescent-tagged detection oligonucleotide probes (see, for example, Taqman™ PCR Reagent Kit, Protocol, part number 402823, Revision A, 1996, Applied Biosystems, Foster City Calif.) that are identified and synthesized from publicly known databases as described for the amplification primers.


For example, without limitation, amplified cDNA is detected and quantified using detection systems such as the ABI Prism® 7900 Sequence Detection System (Applied Biosystems (Foster City, Calif.)), the Cepheid SmartCycler® and Cepheid GeneXpert® Systems, the Fluidigm BioMark™ System, and the Roche LightCycler® 480 Real-Time PCR System. Amounts of specific RNAs contained in the test sample can be related to the relative quantity of fluorescence observed (see for example, Advances in Quantitative PCR Technology: 5′ Nuclease Assays, Y. S. Lie and C. J. Petropolus, Current Opinion in Biotechnology, 1998, 9:43-48, or Rapid Thermal Cycling and PCR Kinetics, pp. 211-229, chapter 14 in PCR applications: protocols for functional genomics, M. A. Innis, D. H. Gelfand and J. J. Sninsky, Eds., 1999, Academic Press). Examples of the procedure used with several of the above-mentioned detection systems are described below. In some embodiments, these procedures can be used for both whole blood RNA and RNA extracted from cultured cells (e.g., without limitation, CTCs, and CECs). In some embodiments, any tissue, body fluid, or cell(s) (e.g., circulating tumor cells (CTCs) or circulating endothelial cells (CECs)) may be used for ex vivo assessment of a biological condition affected by an agent. Methods herein may also be applied using proteins where sensitive quantitative techniques, such as an Enzyme Linked ImmunoSorbent Assay (ELISA) or mass spectroscopy, are available and well-known in the art for measuring the amount of a protein constituent (see WO 98/24935 herein incorporated by reference).


An example of a procedure for the synthesis of first strand cDNA for use in PCR amplification is as follows:


Materials


1. Applied Biosystems TAQMAN Reverse Transcription Reagents Kit (P/N 808-0234). Kit Components: 10× TaqMan RT Buffer, 25 mM Magnesium chloride, deoxyNTPs mixture, Random Hexamers, RNase Inhibitor, MultiScribe Reverse Transcriptase (50 U/mL) (2) RNase/DNase free water (DEPC Treated Water from Ambion (P/N 9915G), or equivalent).


Methods


1. Place RNase Inhibitor and MultiScribe Reverse Transcriptase on ice immediately. All other reagents can be thawed at room temperature and then placed on ice.


2. Remove RNA samples from −80° C. freezer and thaw at room temperature and then place immediately on ice.


3. Prepare the following cocktail of Reverse Transcriptase Reagents for each 100 mL RT reaction (for multiple samples, prepare extra cocktail to allow for pipetting error):

















1 reaction (mL)
11X, e.g.
10 samples (μL)




















10X RT Buffer
10.0
110.0



25 mM MgCl2
22.0
242.0



dNTPs
20.0
220.0



Random Hexamers
5.0
55.0



RNAse Inhibitor
2.0
22.0



Reverse Transcriptase
2.5
27.5



Water
18.5
203.5



Total:
80.0
880.0 (80 μL per sample)










4. Bring each RNA sample to a total volume of 20 μL in a 1.5 mL microcentrifuge tube (for example, remove 10 μL RNA and dilute to 20 μL with RNase/DNase free water, for whole blood RNA use 20 μL total RNA) and add 80 μL RT reaction mix from step 5,2,3. Mix by pipetting up and down.


5. Incubate sample at room temperature for 10 minutes.


6. Incubate sample at 37° C. for 1 hour.


7. Incubate sample at 90° C. for 10 minutes.


8. Quick spin samples in microcentrifuge.


9. Place sample on ice if doing PCR immediately, otherwise store sample at −20° C. for future use.


10. PCR QC should be run on all RT samples using 18S and β-actin.


Following the synthesis of first strand cDNA, one particular embodiment of the approach for amplification of first strand cDNA by PCR, followed by detection and quantification of constituents of a Gene Expression Panel (Precision Profile™) is performed using the ABI Prism® 7900 Sequence Detection System as follows:


Materials


1. 20× Primer/Probe Mix for each gene of interest.


2. 20× Primer/Probe Mix for 18S endogenous control.


3. 2× Taqman Universal PCR Master Mix.


4. cDNA transcribed from RNA extracted from cells.


5. Applied Biosystems 96-Well Optical Reaction Plates.


6. Applied Biosystems Optical Caps, or optical-clear film.


7. Applied Biosystem Prism® 7700 or 7900 Sequence Detector.


Methods


1. Make stocks of each Primer/Probe mix containing the Primer/Probe for the gene of interest, Primer/Probe for 18S endogenous control, and 2× PCR Master Mix as follows. Make sufficient excess to allow for pipetting error e.g., approximately 10% excess. The following example illustrates a typical set up for one gene with quadruplicate samples testing two conditions (2 plates).















1X (1 well) (μL)



















2X Master Mix
7.5



20X 18S Primer/Probe Mix
0.75



20X Gene of interest Primer/Probe Mix
0.75



Total
9.0










2. Make stocks of cDNA targets by diluting 95 μL of cDNA into 2000 μL of water. The amount of cDNA is adjusted to give Ct values between 10 and 18, typically between 12 and 16.


3. Pipette 9 ΔL of Primer/Probe mix into the appropriate wells of an Applied Biosystems 384-Well Optical Reaction Plate.


4. Pipette 10 μL of cDNA stock solution into each well of the Applied Biosystems 384-Well Optical Reaction Plate.


5. Seal the plate with Applied Biosystems Optical Caps, or optical-clear film.


6. Analyze the plate on the ABI Prism® 7900 Sequence Detector.


In another embodiment of the invention, the use of the primer probe with the first strand cDNA as described above to permit measurement of constituents of a Gene Expression Panel (Precision Profile™) is performed using a QPCR assay on Cepheid SmartCycler® and GeneXpert® Instruments as follows:


I. To run a QPCR assay in duplicate on the Cepheid SmartCycler® instrument containing three target genes and one reference gene, the following procedure should be followed.


A. With 20× Primer/Probe Stocks.


Materials


1. SmartMix™-HM lyophilized Master Mix.


2. Molecular grade water.


3. 20× Primer/Probe Mix for the 18S endogenous control gene. The endogenous control gene will be dual labeled with VIC-MGB or equivalent.


4. 20× Primer/Probe Mix for each for target gene one, dual labeled with FAM-BHQ1 or equivalent.


5. 20× Primer/Probe Mix for each for target gene two, dual labeled with Texas Red-BHQ2 or equivalent.


6. 20× Primer/Probe Mix for each for target gene three, dual labeled with Alexa 647-BHQ3 or equivalent.


7. Tris buffer, pH 9.0


8. cDNA transcribed from RNA extracted from sample.


9. SmartCycler® 25 μL tube.


10. Cepheid SmartCycler® instrument.


Methods


1. For each cDNA sample to be investigated, add the following to a sterile 650 μL tube.
















SmartMix ™-HM lyophilized Master Mix
1 bead









20X 18S Primer/Probe Mix
2.5 μL



20X Target Gene 1 Primer/Probe Mix
2.5 μL



20X Target Gene 2 Primer/Probe Mix
2.5 μL



20X Target Gene 3 Primer/Probe Mix
2.5 μL



Tris Buffer, pH 9.0
2.5 μL



Sterile Water
34.5 μL 



Total
 47 μL











Vortex the mixture for 1 second three times to completely mix the reagents. Briefly centrifuge the tube after vortexing.


2. Dilute the cDNA sample so that a 3 μL addition to the reagent mixture above will give an 18S reference gene CT value between 12 and 16.


3. Add 3 μL of the prepared cDNA sample to the reagent mixture bringing the total volume to 50 μL. Vortex the mixture for 1 second three times to completely mix the reagents. Briefly centrifuge the tube after vortexing.


4. Add 25 μL of the mixture to each of two SmartCycler® tubes, cap the tube and spin for 5 seconds in a microcentrifuge having an adapter for SmartCycler® tubes.


5. Remove the two SmartCycler® tubes from the microcentrifuge and inspect for air bubbles. If bubbles are present, re-spin, otherwise, load the tubes into the SmartCycler® instrument.


6. Run the appropriate QPCR protocol on the SmartCycler®, export the data and analyze the results.


B. With Lyophilized SmartBeads™.


Materials


1. SmartMix™-HM lyophilized Master Mix.


2. Molecular grade water.


3. SmartBeads™ containing the 18S endogenous control gene dual labeled with VIC-MGB or equivalent, and the three target genes, one dual labeled with FAM-BHQ1 or equivalent, one dual labeled with Texas Red-BHQ2 or equivalent and one dual labeled with Alexa 647-BHQ3 or equivalent.


4. Tris buffer, pH 9.0


5. cDNA transcribed from RNA extracted from sample.


6. SmartCycler® 25 μL tube.


7. Cepheid SmartCycler® instrument.


Methods


1. For each cDNA sample to be investigated, add the following to a sterile 650 μL tube.



















SmartMix ™-HM lyophilized Master Mix
1
bead



SmartBead ™ containing four primer/probe sets
1
bead



Tris Buffer, pH 9.0
2.5
μL



Sterile Water
44.5
μL



Total
47
μL











Vortex the mixture for 1 second three times to completely mix the reagents. Briefly centrifuge the tube after vortexing.


2. Dilute the cDNA sample so that a 3 μL addition to the reagent mixture above will give an 18S reference gene CT value between 12 and 16.


3. Add 3 μL of the prepared cDNA sample to the reagent mixture bringing the total volume to 50 μL. Vortex the mixture for 1 second three times to completely mix the reagents. Briefly centrifuge the tube after vortexing.


4. Add 25 μL of the mixture to each of two SmartCycler® tubes, cap the tube and spin for 5 seconds in a microcentrifuge having an adapter for SmartCycler® tubes.


5. Remove the two SmartCycler®tubes from the microcentrifuge and inspect for air bubbles. If bubbles are present, re-spin, otherwise, load the tubes into the SmartCycler® instrument.


6. Run the appropriate QPCR protocol on the SmartCycler®, export the data and analyze the results.


II. To run a QPCR assay on the Cepheid GeneXpert® instrument containing three target genes and one reference gene, the following procedure should be followed. Note that to do duplicates, two self contained cartridges need to be loaded and run on the GeneXpert® instrument.


Materials


1. Cepheid GeneXpert® self contained cartridge preloaded with a lyophilized SmartMix™-HM master mix bead and a lyophilized SmartBead™ containing four primer/probe sets.


2. Molecular grade water, containing Tris buffer, pH 9.0.


3. Extraction and purification reagents.


4. Clinical sample (whole blood, RNA, etc.)


5. Cepheid GeneXpert® instrument.


Methods


1. Remove appropriate GeneXpert® self contained cartridge from packaging.


2. Fill appropriate chamber of self contained cartridge with molecular grade water with Tris buffer, pH 9.0.


3. Fill appropriate chambers of self contained cartridge with extraction and purification reagents.


4. Load aliquot of clinical sample into appropriate chamber of self contained cartridge.


5. Seal cartridge and load into GeneXpert® instrument.


6. Run the appropriate extraction and amplification protocol on the GeneXpert® and analyze the resultant data.


In yet another embodiment of the invention, the use of the primer probe with the first strand cDNA as described above to permit measurement of constituents of a Gene Expression Panel (Precision Profile™) is performed using a QPCR assay on the Roche LightCycler® 480 Real-Time PCR System as follows:


Materials


1. 20× Primer/Probe stock for the 18S endogenous control gene. The endogenous control gene may be dual labeled with either VIC-MGB or VIC-TAMRA.


2. 20× Primer/Probe stock for each target gene, dual labeled with either FAM-TAMRA or FAM-BHQ1.


3. 2× LightCycler® 490 Probes Master (master mix).


4. 1× cDNA sample stocks transcribed from RNA extracted from samples.


5. 1× TE buffer, pH 8.0.


6. LightCycler® 480 384-well plates.


7. Source MDx 24 gene Precision Profile™ 96-well intermediate plates.


8. RNase/DNase free 96-well plate.


9. 1.5 mL microcentrifuge tubes.


10. Beckman/Coulter Biomek® 3000 Laboratory Automation Workstation.


11. Velocity11 Bravo™ Liquid Handling Platform.


12. LightCycler® 480 Real-Time PCR System.


Methods


1. Remove a Source MDx 24 gene Precision Profile™ 96-well intermediate plate from the freezer, thaw and spin in a plate centrifuge.


2. Dilute four (4) 1× cDNA sample stocks in separate 1.5 mL microcentrifuge tubes with the total final volume for each of 540 μL.


3. Transfer the 4 diluted cDNA samples to an empty RNase/DNase free 96-well plate using the Biomek® 3000 Laboratory Automation Workstation.


4. Transfer the cDNA samples from the cDNA plate created in step 3 to the thawed and centrifuged Source MDx 24 gene Precision Profile™ 96-well intermediate plate using Biomek® 3000 Laboratory Automation Workstation. Seal the plate with a foil seal and spin in a plate centrifuge.


5. Transfer the contents of the cDNA-loaded Source MDx 24 gene Precision Profile™ 96-well intermediate plate to a new LightCycler® 480 384-well plate using the Bravo™ Liquid Handling Platform. Seal the 384-well plate with a LightCycler® 480 optical sealing foil and spin in a plate centrifuge for 1 minute at 2000 rpm.


6. Place the sealed in a dark 4° C. refrigerator for a minimum of 4 minutes.


7. Load the plate into the LightCycler® 480 Real-Time PCR System and start the LightCycler® 480 software. Chose the appropriate run parameters and start the run.


8. At the conclusion of the run, analyze the data and export the resulting CP values to the database.


In some instances, target gene FAM measurements may be beyond the detection limit of the particular platform instrument used to detect and quantify constituents of a Gene Expression Panel (Precision Profile™). To address the issue of “undetermined” gene expression measures as lack of expression for a particular gene, the detection limit may be reset and the “undetermined” constituents may be “flagged”. For example without limitation, the ABI Prism® 7900HT Sequence Detection System reports target gene FAM measurements that are beyond the detection limit of the instrument (>40 cycles) as “undetermined”. Detection Limit Reset is performed when at least 1 of 3 target gene FAM CT replicates are not detected after 40 cycles and are designated as “undetermined”. “Undetermined” target gene FAM CT replicates are re-set to 40 and flagged. CT normalization (ΔCT) and relative expression calculations that have used re-set FAM CT values are also flagged.


Baseline Profile Data Sets

The analyses of samples from single individuals and from large groups of individuals provide a library of profile data sets relating to a particular panel or series of panels. These profile data sets may be stored as records in a library for use as baseline profile data sets. As the term “baseline” suggests, the stored baseline profile data sets serve as comparators for providing a calibrated profile data set that is informative about a biological condition or agent. Baseline profile data sets may be stored in libraries and classified in a number of cross-referential ways. One form of classification may rely on the characteristics of the panels from which the data sets are derived. Another form of classification may be by particular biological condition, e.g., breast cancer. The concept of a biological condition encompasses any state in which a cell or population of cells may be found at any one time. This state may reflect geography of samples, sex of subjects or any other discriminator. Some of the discriminators may overlap. The libraries may also be accessed for records associated with a single subject or particular clinical trial. The classification of baseline profile data sets may further be annotated with medical information about a particular subject, a medical condition, and/or a particular agent.


The choice of a baseline profile data set for creating a calibrated profile data set is related to the biological condition to be evaluated, monitored, or predicted, as well as, the intended use of the calibrated panel, e.g., as to monitor drug development, quality control or other uses. It may be desirable to access baseline profile data sets from the same subject for whom a first profile data set is obtained or from different subject at varying times, exposures to stimuli, drugs or complex compounds; or may be derived from like or dissimilar populations or sets of subjects. The baseline profile data set may be normal, healthy baseline.


The profile data set may arise from the same subject for which the first data set is obtained, where the sample is taken at a separate or similar time, a different or similar site or in a different or similar biological condition. For example, a sample may be taken before stimulation or after stimulation with an exogenous compound or substance, such as before or after therapeutic treatment. Alternatively the sample is taken before or include before or after a surgical procedure for breast cancer. The profile data set obtained from the unstimulated sample may serve as a baseline profile data set for the sample taken after stimulation. The baseline data set may also be derived from a library containing profile data sets of a population or set of subjects having some defining characteristic or biological condition. The baseline profile data set may also correspond to some ex vivo or in vitro properties associated with an in vitro cell culture. The resultant calibrated profile data sets may then be stored as a record in a database or library along with or separate from the baseline profile data base and optionally the first profile data set although the first profile data set would normally become incorporated into a baseline profile data set under suitable classification criteria. The remarkable consistency of Gene Expression Profiles associated with a given biological condition makes it valuable to store profile data, which can be used, among other things for normative reference purposes. The normative reference can serve to indicate the degree to which a subject conforms to a given biological condition (healthy or diseased) and, alternatively or in addition, to provide a target for clinical intervention.


Calibrated Data

Given the repeatability achieved in measurement of gene expression, described above in connection with “Gene Expression Panels” (Precision Profiles™) and “gene amplification”, it was concluded that where differences occur in measurement under such conditions, the differences are attributable to differences in biological condition. Thus, it has been found that calibrated profile data sets are highly reproducible in samples taken from the same individual under the same conditions. Similarly, it has been found that calibrated profile data sets are reproducible in samples that are repeatedly tested. Also found have been repeated instances wherein calibrated profile data sets obtained when samples from a subject are exposed ex vivo to a compound are comparable to calibrated profile data from a sample that has been exposed to a sample in vivo.


Calculation of Calibrated Profile Data Sets and Computational Aids

The calibrated profile data set may be expressed in a spreadsheet or represented graphically for example, in a bar chart or tabular form but may also be expressed in a three dimensional representation. The function relating the baseline and profile data may be a ratio expressed as a logarithm. The constituent may be itemized on the x-axis and the logarithmic scale may be on the y-axis. Members of a calibrated data set may be expressed as a positive value representing a relative enhancement of gene expression or as a negative value representing a relative reduction in gene expression with respect to the baseline.


Each member of the calibrated profile data set should be reproducible within a range with respect to similar samples taken from the subject under similar conditions. For example, the calibrated profile data sets may be reproducible within 20%, and typically within 10%. In accordance with embodiments of the invention, a pattern of increasing, decreasing and no change in relative gene expression from each of a plurality of gene loci examined in the Gene Expression Panel (Precision Profile™) may be used to prepare a calibrated profile set that is informative with regards to a biological condition, biological efficacy of an agent treatment conditions or for comparison to populations or sets of subjects or samples, or for comparison to populations of cells. Patterns of this nature may be used to identify likely candidates for a drug trial, used alone or in combination with other clinical indicators to be diagnostic or prognostic with respect to a biological condition or may be used to guide the development of a pharmaceutical or nutraceutical through manufacture, testing and marketing.


The numerical data obtained from quantitative gene expression and numerical data from calibrated gene expression relative to a baseline profile data set may be stored in databases or digital storage mediums and may be retrieved for purposes including managing patient health care or for conducting clinical trials or for characterizing a drug. The data may be transferred in physical or wireless networks via the World Wide Web, email, or internet access site for example or by hard copy so as to be collected and pooled from distant geographic sites.


The method also includes producing a calibrated profile data set for the panel, wherein each member of the calibrated profile data set is a function of a corresponding member of the first profile data set and a corresponding member of a baseline profile data set for the panel, and wherein the baseline profile data set is related to the breast cancer or conditions related to breast cancer to be evaluated, with the calibrated profile data set being a comparison between the first profile data set and the baseline profile data set, thereby providing evaluation of breast cancer or conditions related to breast cancer of the subject.


In yet other embodiments, the function is a mathematical function and is other than a simple difference, including a second function of the ratio of the corresponding member of first profile data set to the corresponding member of the baseline profile data set, or a logarithmic function. In such embodiments, the first sample is obtained and the first profile data set quantified at a first location, and the calibrated profile data set is produced using a network to access a database stored on a digital storage medium in a second location, wherein the database may be updated to reflect the first profile data set quantified from the sample. Additionally, using a network may include accessing a global computer network.


In an embodiment of the present invention, a descriptive record is stored in a single database or multiple databases where the stored data includes the raw gene expression data (first profile data set) prior to transformation by use of a baseline profile data set, as well as a record of the baseline profile data set used to generate the calibrated profile data set including for example, annotations regarding whether the baseline profile data set is derived from a particular Signature Panel and any other annotation that facilitates interpretation and use of the data.


Because the data is in a universal format, data handling may readily be done with a computer. The data is organized so as to provide an output optionally corresponding to a graphical representation of a calibrated data set.


The above described data storage on a computer may provide the information in a form that can be accessed by a user. Accordingly, the user may load the information onto a second access site including downloading the information. However, access may be restricted to users having a password or other security device so as to protect the medical records contained within. A feature of this embodiment of the invention is the ability of a user to add new or annotated records to the data set so the records become part of the biological information.


The graphical representation of calibrated profile data sets pertaining to a product such as a drug provides an opportunity for standardizing a product by means of the calibrated profile, more particularly a signature profile. The profile may be used as a feature with which to demonstrate relative efficacy, differences in mechanisms of actions, etc. compared to other drugs approved for similar or different uses.


The various embodiments of the invention may be also implemented as a computer program product for use with a computer system. The product may include program code for deriving a first profile data set and for producing calibrated profiles. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (for example, a diskette, CD-ROM, ROM, or fixed disk), or transmittable to a computer system via a modem or other interface device, such as a communications adapter coupled to a network. The network coupling may be for example, over optical or wired communications lines or via wireless techniques (for example, microwave, infrared or other transmission techniques) or some combination of these. The series of computer instructions preferably embodies all or part of the functionality previously described herein with respect to the system. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (for example, shrink wrapped software), preloaded with a computer system (for example, on system ROM or fixed disk), or distributed from a server or electronic bulletin board over a network (for example, the Internet or World Wide Web). In addition, a computer system is further provided including derivative modules for deriving a first data set and a calibration profile data set.


The calibration profile data sets in graphical or tabular form, the associated databases, and the calculated index or derived algorithm, together with information extracted from the panels, the databases, the data sets or the indices or algorithms are commodities that can be sold together or separately for a variety of purposes as described in WO 01/25473.


In other embodiments, a clinical indicator may be used to assess the breast cancer or conditions related to breast cancer of the relevant set of subjects by interpreting the calibrated profile data set in the context of at least one other clinical indicator, wherein the at least one other clinical indicator is selected from the group consisting of blood chemistry, X-ray or other radiological or metabolic imaging technique, molecular markers in the blood, other chemical assays, and physical findings.


Index Construction

In combination, (i) the remarkable consistency of Gene Expression Profiles with respect to a biological condition across a population or set of subject or samples, or across a population of cells and (ii) the use of procedures that provide substantially reproducible measurement of constituents in a Gene Expression Panel (Precision Profile™) giving rise to a Gene Expression Profile, under measurement conditions wherein specificity and efficiencies of amplification for all constituents of the panel are substantially similar, make possible the use of an index that characterizes a Gene Expression Profile, and which therefore provides a measurement of a biological condition.


An index may be constructed using an index function that maps values in a Gene Expression Profile into a single value that is pertinent to the biological condition at hand. The values in a Gene Expression Profile are the amounts of each constituent of the Gene Expression Panel (Precision Profile™). These constituent amounts form a profile data set, and the index function generates a single value—the index—from the members of the profile data set.


The index function may conveniently be constructed as a linear sum of terms, each term being what is referred to herein as a “contribution function” of a member of the profile data set. For example, the contribution function may be a constant times a power of a member of the profile data set. So the index function would have the form






I=ΣCiMi
P(i),


where I is the index, Mi is the value of the member i of the profile data set, Ci is a constant, and P(i) is a power to which Mi is raised, the sum being formed for all integral values of i up to the number of members in the data set. We thus have a linear polynomial expression. The role of the coefficient Ci for a particular gene expression specifies whether a higher ΔCt value for this gene either increases (a positive Ci) or decreases (a lower value) the likelihood of breast cancer, the ΔCt values of all other genes in the expression being held constant.


The values Ci and P(i) may be determined in a number of ways, so that the index I is informative of the pertinent biological condition. One way is to apply statistical techniques, such as latent class modeling, to the profile data sets to correlate clinical data or experimentally derived data, or other data pertinent to the biological condition. In this connection, for example, may be employed the software from Statistical Innovations, Belmont, Mass., called Latent Gold®. Alternatively, other simpler modeling techniques may be employed in a manner known in the art. The index function for breast cancer may be constructed, for example, in a manner that a greater degree of breast cancer (as determined by the profile data set for the any of the Precision Profiles™ (listed in Tables 1-5) described herein) correlates with a large value of the index function.


Just as a baseline profile data set, discussed above, can be used to provide an appropriate normative reference, and can even be used to create a Calibrated profile data set, as discussed above, based on the normative reference, an index that characterizes a Gene Expression Profile can also be provided with a normative value of the index function used to create the index. This normative value can be determined with respect to a relevant population or set of subjects or samples or to a relevant population of cells, so that the index may be interpreted in relation to the normative value. The relevant population or set of subjects or samples, or relevant population of cells may have in common a property that is at least one of age range, gender, ethnicity, geographic location, nutritional history, medical condition, clinical indicator, medication, physical activity, body mass, and environmental exposure.


As an example, the index can be constructed, in relation to a normative Gene Expression Profile for a population or set of healthy subjects, in such a way that a reading of approximately 1 characterizes normative Gene Expression Profiles of healthy subjects. Let us further assume that the biological condition that is the subject of the index is breast cancer; a reading of 1 in this example thus corresponds to a Gene Expression Profile that matches the norm for healthy subjects. A substantially higher reading then may identify a subject experiencing breast cancer, or a condition related to breast cancer. The use of 1 as identifying a normative value, however, is only one possible choice; another logical choice is to use 0 as identifying the normative value. With this choice, deviations in the index from zero can be indicated in standard deviation units (so that values lying between −1 and +1 encompass 90% of a normally distributed reference population or set of subjects. Since it was determined that Gene Expression Profile values (and accordingly constructed indices based on them) tend to be normally distributed, the 0-centered index constructed in this manner is highly informative. It therefore facilitates use of the index in diagnosis of disease and setting objectives for treatment.


Still another embodiment is a method of providing an index pertinent to breast cancer or conditions related to breast cancer of a subject based on a first sample from the subject, the first sample providing a source of RNAs, the method comprising deriving from the first sample a profile data set, the profile data set including a plurality of members, each member being a quantitative measure of the amount of a distinct RNA constituent in a panel of constituents selected so that measurement of the constituents is indicative of the presumptive signs of breast cancer, the panel including at least one of the constituents of any of the genes listed in the Precision Profiles™ listed in Tables 1-5. In deriving the profile data set, such measure for each constituent is achieved under measurement conditions that are substantially repeatable, at least one measure from the profile data set is applied to an index function that provides a mapping from at least one measure of the profile data set into one measure of the presumptive signs of breast cancer, so as to produce an index pertinent to the breast cancer or conditions related to breast cancer of the subject.


As another embodiment of the invention, an index function I of the form






I=C
0
+ΣCiM
1i
P1(i)
M
2i
P2(i),


can be employed, where M1 and M2 are values of the member i of the profile data set, Ci is a constant determined without reference to the profile data set, and P1 and P2 are powers to which M1 and M2 are raised. The role of P1(i) and P2(i) is to specify the specific functional form of the quadratic expression, whether in fact the equation is linear, quadratic, contains cross-product terms, or is constant. For example, when P1=P2 32 0, the index function is simply the sum of constants; when P1=1 and P2=0, the index function is a linear expression; when P1=P2=1, the index function is a quadratic expression.


The constant C0 serves to calibrate this expression to the biological population of interest that is characterized by having breast cancer. In this embodiment, when the index value equals 0, the odds are 50:50 of the subject having breast cancer vs a normal subject. More generally, the predicted odds of the subject having breast cancer is [exp(Ii)], and therefore the predicted probability of having breast cancer is [exp(Ii)]/[1+exp((Ii)]. Thus, when the index exceeds 0, the predicted probability that a subject has breast cancer is higher than 0.5, and when it falls below 0, the predicted probability is less than 0.5.


The value of C0 may be adjusted to reflect the prior probability of being in this population based on known exogenous risk factors for the subject. In an embodiment where C0 is adjusted as a function of the subject's risk factors, where the subject has prior probability pi of having breast cancer based on such risk factors, the adjustment is made by increasing (decreasing) the unadjusted C0 value by adding to C0 the natural logarithm of the following ratio: the prior odds of having breast cancer taking into account the risk factors/the overall prior odds of having breast cancer without taking into account the risk factors.


Performance and Accuracy Measures of the Invention

The performance and thus absolute and relative clinical usefulness of the invention may be assessed in multiple ways as noted above. Amongst the various assessments of performance, the invention is intended to provide accuracy in clinical diagnosis and prognosis. The accuracy of a diagnostic or prognostic test, assay, or method concerns the ability of the test, assay, or method to distinguish between subjects having breast cancer is based on whether the subjects have an “effective amount” or a “significant alteration” in the levels of a cancer associated gene. By “effective amount” or “significant alteration”, it is meant that the measurement of an appropriate number of cancer associated gene (which may be one or more) is different than the predetermined cut-off point (or threshold value) for that cancer associated gene and therefore indicates that the subject has breast cancer for which the cancer associated gene(s) is a determinant.


The difference in the level of cancer associated gene(s) between normal and abnormal is preferably statistically significant. As noted below, and without any limitation of the invention, achieving statistical significance, and thus the preferred analytical and clinical accuracy, generally but not always requires that combinations of several cancer associated gene(s) be used together in panels and combined with mathematical algorithms in order to achieve a statistically significant cancer associated gene index.


In the categorical diagnosis of a disease state, changing the cut point or threshold value of a test (or assay) usually changes the sensitivity and specificity, but in a qualitatively inverse relationship. Therefore, in assessing the accuracy and usefulness of a proposed medical test, assay, or method for assessing a subject's condition, one should always take both sensitivity and specificity into account and be mindful of what the cut point is at which the sensitivity and specificity are being reported because sensitivity and specificity may vary significantly over the range of cut points. Use of statistics such as AUC, encompassing all potential cut point values, is preferred for most categorical risk measures using the invention, while for continuous risk measures, statistics of goodness-of-fit and calibration to observed results or other gold standards, are preferred.


Using such statistics, an “acceptable degree of diagnostic accuracy”, is herein defined as a test or assay (such as the test of the invention for determining an effective amount or a significant alteration of cancer associated gene(s), which thereby indicates the presence of a breast cancer in which the AUC (area under the ROC curve for the test or assay) is at least 0.60, desirably at least 0.65, more desirably at least 0.70, preferably at least 0.75, more preferably at least 0.80, and most preferably at least 0.85.


By a “very high degree of diagnostic accuracy”, it is meant a test or assay in which the AUC (area under the ROC curve for the test or assay) is at least 0.75, desirably at least 0.775, more desirably at least 0.800, preferably at least 0.825, more preferably at least 0.850, and most preferably at least 0.875.


The predictive value of any test depends on the sensitivity and specificity of the test, and on the prevalence of the condition in the population being tested. This notion, based on Bayes' theorem, provides that the greater the likelihood that the condition being screened for is present in an individual or in the population (pre-test probability), the greater the validity of a positive test and the greater the likelihood that the result is a true positive. Thus, the problem with using a test in any population where there is a low likelihood of the condition being present is that a positive result has limited value (i.e., more likely to be a false positive). Similarly, in populations at very high risk, a negative test result is more likely to be a false negative.


As a result, ROC and AUC can be misleading as to the clinical utility of a test in low disease prevalence tested populations (defined as those with less than 1% rate of occurrences (incidence) per annum, or less than 10% cumulative prevalence over a specified time horizon). Alternatively, absolute risk and relative risk ratios as defined elsewhere in this disclosure can be employed to determine the degree of clinical utility. Populations of subjects to be tested can also be categorized into quartiles by the test's measurement values, where the top quartile (25% of the population) comprises the group of subjects with the highest relative risk for developing breast cancer, and the bottom quartile comprising the group of subjects having the lowest relative risk for developing breast cancer. Generally, values derived from tests or assays having over 2.5 times the relative risk from top to bottom quartile in a low prevalence population are considered to have a “high degree of diagnostic accuracy,” and those with five to seven times the relative risk for each quartile are considered to have a “very high degree of diagnostic accuracy.” Nonetheless, values derived from tests or assays having only 1.2 to 2.5 times the relative risk for each quartile remain clinically useful are widely used as risk factors for a disease. Often such lower diagnostic accuracy tests must be combined with additional parameters in order to derive meaningful clinical thresholds for therapeutic intervention, as is done with the aforementioned global risk assessment indices.


A health economic utility function is yet another means of measuring the performance and clinical value of a given test, consisting of weighting the potential categorical test outcomes based on actual measures of clinical and economic value for each. Health economic performance is closely related to accuracy, as a health economic utility function specifically assigns an economic value for the benefits of correct classification and the costs of misclassification of tested subjects. As a performance measure, it is not unusual to require a test to achieve a level of performance which results in an increase in health economic value per test (prior to testing costs) in excess of the target price of the test.


In general, alternative methods of determining diagnostic accuracy are commonly used for continuous measures, when a disease category or risk category (such as those at risk for having a bone fracture) has not yet been clearly defined by the relevant medical societies and practice of medicine, where thresholds for therapeutic use are not yet established, or where there is no existing gold standard for diagnosis of the pre-disease. For continuous measures of risk, measures of diagnostic accuracy for a calculated index are typically based on curve fit and calibration between the predicted continuous value and the actual observed values (or a historical index calculated value) and utilize measures such as R squared, Hosmer-Lemeshow P-value statistics and confidence intervals. It is not unusual for predicted values using such algorithms to be reported including a confidence interval (usually 90% or 95% CI) based on a historical observed cohort's predictions, as in the test for risk of future breast cancer recurrence commercialized by Genomic Health, Inc. (Redwood City, Calif.).


In general, by defining the degree of diagnostic accuracy, i.e., cut points on a ROC curve, defining an acceptable AUC value, and determining the acceptable ranges in relative concentration of what constitutes an effective amount of the cancer associated gene(s) of the invention allows for one of skill in the art to use the cancer associated gene(s) to identify, diagnose, or prognose subjects with a pre-determined level of predictability and performance.


Results from the cancer associated gene(s) indices thus derived can then be validated through their calibration with actual results, that is, by comparing the predicted versus observed rate of disease in a given population, and the best predictive cancer associated gene(s) selected for and optimized through mathematical models of increased complexity. Many such formula may be used; beyond the simple non-linear transformations, such as logistic regression, of particular interest in this use of the present invention are structural and synactic classification algorithms, and methods of risk index construction, utilizing pattern recognition features, including established techniques such as the Kth-Nearest Neighbor, Boosting, Decision Trees, Neural Networks, Bayesian Networks, Support Vector Machines, and Hidden Markov Models, as well as other formula described herein.


Furthermore, the application of such techniques to panels of multiple cancer associated gene(s) is provided, as is the use of such combination to create single numerical “risk indices” or “risk scores” encompassing information from multiple cancer associated gene(s) inputs. Individual B cancer associated gene(s) may also be included or excluded in the panel of cancer associated gene(s) used in the calculation of the cancer associated gene(s) indices so derived above, based on various measures of relative performance and calibration in validation, and employing through repetitive training methods such as forward, reverse, and stepwise selection, as well as with genetic algorithm approaches, with or without the use of constraints on the complexity of the resulting cancer associated gene(s) indices.


The above measurements of diagnostic accuracy for cancer associated gene(s) are only a few of the possible measurements of the clinical performance of the invention. It should be noted that the appropriateness of one measurement of clinical accuracy or another will vary based upon the clinical application, the population tested, and the clinical consequences of any potential misclassification of subjects. Other important aspects of the clinical and overall performance of the invention include the selection of cancer associated gene(s) so as to reduce overall cancer associated gene(s) variability (whether due to method (analytical) or biological (pre-analytical variability, for example, as in diurnal variation), or to the integration and analysis of results (post-analytical variability) into indices and cut-off ranges), to assess analyte stability or sample integrity, or to allow the use of differing sample matrices amongst blood, cells, serum, plasma, urine, etc.


Kits

The invention also includes a breast cancer detection reagent, i.e., nucleic acids that specifically identify one or more breast cancer or condition related to breast cancer nucleic acids (e.g., any gene listed in Tables 1-5, oncogenes, tumor suppression genes, tumor progression genes, angiogenesis genes and lymphogenesis genes; sometimes referred to herein as breast cancer associated genes or breast cancer associated constituents) by having homologous nucleic acid sequences, such as oligonucleotide sequences, complementary to a portion of the breast cancer genes nucleic acids or antibodies to proteins encoded by the breast cancer gene nucleic acids packaged together in the form of a kit. The oligonucleotides can be fragments of the breast cancer genes. For example the oligonucleotides can be 200, 150, 100, 50, 25, 10 or less nucleotides in length. The kit may contain in separate containers a nucleic acid or antibody (either already bound to a solid matrix or packaged separately with reagents for binding them to the matrix), control formulations (positive and/or negative), and/or a detectable label. Instructions (i.e., written, tape, VCR, CD-ROM, etc.) for carrying out the assay may be included in the kit. The assay may for example be in the form of PCR, a Northern hybridization or a sandwich ELISA, as known in the art.


For example, breast cancer gene detection reagents can be immobilized on a solid matrix such as a porous strip to form at least one breast cancer gene detection site. The measurement or detection region of the porous strip may include a plurality of sites containing a nucleic acid. A test strip may also contain sites for negative and/or positive controls. Alternatively, control sites can be located on a separate strip from the test strip. Optionally, the different detection sites may contain different amounts of immobilized nucleic acids, i.e., a higher amount in the first detection site and lesser amounts in subsequent sites. Upon the addition of test sample, the number of sites displaying a detectable signal provides a quantitative indication of the amount of breast cancer genes present in the sample. The detection sites may be configured in any suitably detectable shape and are typically in the shape of a bar or dot spanning the width of a test strip.


Alternatively, breast cancer detection genes can be labeled (e.g., with one or more fluorescent dyes) and immobilized on lyophilized beads to form at least one breast cancer gene detection site. The beads may also contain sites for negative and/or positive controls. Upon addition of the test sample, the number of sites displaying a detectable signal provides a quantitative indication of the amount of breast cancer genes present in the sample.


Alternatively, the kit contains a nucleic acid substrate array comprising one or more nucleic acid sequences. The nucleic acids on the array specifically identify one or more nucleic acid sequences represented by breast cancer genes (see Tables 1-5). In various embodiments, the expression of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 40 or 50 or more of the sequences represented by breast cancer genes (see Tables 1-5) can be identified by virtue of binding to the array. The substrate array can be on, i.e., a solid substrate, i.e., a “chip” as described in U.S. Pat. No. 5,744,305. Alternatively, the substrate array can be a solution array, i.e., Luminex, Cyvera, Vitra and Quantum Dots' Mosaic.


The skilled artisan can routinely make antibodies, nucleic acid probes, i.e., oligonucleotides, aptamers, siRNAs, antisense oligonucleotides, against any of the breast cancer genes listed in Tables 1-5.


Other Embodiments

While the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.


Examples
Example 1
Patient Population

RNA was isolated using the PAXgene System from blood samples obtained from a total of 49 female subjects suffering from breast cancer and 26 healthy, normal (i.e., not suffering from or diagnosed with breast cancer) female subjects. These RNA samples were used for the gene expression analysis studies described in Examples 3-7 below.


Each of the normal female subjects in the studies were non-smokers. The inclusion criteria for the breast cancer subjects that participated in the study were as follows: each of the subjects had defined, newly diagnosed disease, the blood samples were obtained prior to initiation of any treatment for breast cancer, and each subject in the study was 18 years or older, and able to provide consent.


The following criteria were used to exclude subjects from the study: any treatment with immunosuppressive drugs, corticosteroids or investigational drugs; diagnosis of acute and chronic infectious diseases (renal or chest infections, previous TB, HIV infection or AIDS, or active cytomegalovirus); symptoms of severe progression or uncontrolled renal, hepatic, hematological, gastrointestinal, endocrine, pulmonary, neurological, or cerebral disease; and pregnancy.


Of the 49 newly diagnosed breast cancer subjects from which blood samples were obtained, 2 subjects were diagnosed with Stage 0 (in situ) breast cancer, 17 subjects were diagnosed with Stage 1 breast cancer, 26 subjects were diagnosed with Stage 2 breast cancer, 1 subject was diagnosed with Stage 3 breast cancer, and 3 subjects were diagnosed with Stage 4 breast cancer.


Example 2
Enumeration and Classification Methodology Based on Logistic Regression Models Introduction

The following methods were used to generate the 1, 2, and 3-gene models capable of distinguishing between subjects diagnosed with breast cancer and normal subjects, with at least 75% classification accurary, described in Examples 3-7 below.


Given measurements on G genes from samples of N1 subjects belonging to group 1 and N2 members of group 2, the purpose was to identify models containing g<G genes which discriminate between the 2 groups. The groups might be such that one consists of reference subjects (e.g., healthy, normal subjects) while the other group might have a specific disease, or subjects in group 1 may have disease A while those in group 2 may have disease B.


Specifically, parameters from a linear logistic regression model were estimated to predict a subject's probability of belonging to group 1 given his (her) measurements on the g genes in the model. After all the models were estimated (all G 1-gene models were estimated, as well as








all






(



G




2



)


=


G
*




(

G
-
1

)

2


2


-


gene





models


,




and all (G 3)=G*(G−1)*(G−2)/6 3-gene models based on G genes (number of combinations taken 3 at a time from G)), they were evaluated using a 2-dimensional screening process. The first dimension employed a statistical screen (significance of incremental p-values) that eliminated models that were likely to overfit the data and thus may not validate when applied to new subjects. The second dimension employed a clinical screen to eliminate models for which the expected misclassification rate was higher than an acceptable level. As a threshold analysis, the gene models showing less than 75% discrimination between N1 subjects belonging to group 1 and N2 members of group 2 (i.e., misclassification of 25% or more of subjects in either of the 2 sample groups), and genes with incremental p-values that were not statistically significant, were eliminated.


Methodological, Statistical and Computing Tools Used

The Latent GOLD program (Vermunt and Magidson, 2005) was used to estimate the logistic regression models. For efficiency in processing the models, the LG-Syntax™ Module available with version 4.5 of the program (Vermunt and Magidson, 2007) was used in batch mode, and all g-gene models associated with a particular dataset were submitted in a single run to be estimated. That is, all 1-gene models were submitted in a single run, all 2-gene models were submitted in a second run, etc.


The Data

The data consists of ΔCT values for each sample subject in each of the 2 groups (e.g., cancer subject vs. reference (e.g., healthy, normal subjects) on each of G(k) genes obtained from a particular class k of genes. For a given disease, separate analyses were performed based on disease specific genes, including without limitation genes specific for prostate, breast, ovarian, cervical, lung, colon, and skin cancer, (k=1), inflammatory genes (k=2), human cancer general genes (k=3), genes from a cross cancer gene panel (k=4), and genes in the EGR family (k=5).


Analysis Steps

The steps in a given analysis of the G(k) genes measured on NI subjects in group 1 and N2 subjects in group 2 are as follows:

    • 1) Eliminate low expressing genes: In some instances, target gene FAM measurements were beyond the detection limit (i.e., very high ΔCT values which indicate low expression) of the particular platform instrument used to detect and quantify constituents of a Gene Expression Panel (Precision Profile™). To address the issue of “undetermined” gene expression measures as lack of expression for a particular gene, the detection limit was reset and the “undetermined” constituents were “flagged”, as previously described. CT normalization (ΔCT) and relative expression calculations that have used re-set FAM CT values were also flagged. In some instances, these low expressing genes (i.e., re-set FAM CT values) were eliminated from the analysis in step 1 if 50% or more ΔCT values from either of the 2 groups were flagged. Although such genes were eliminated from the statistical analyses described herein, one skilled in the art would recognize that such genes may be relevant in a disease state.
      • 2) Estimate logistic regression (logit) models predicting P(i)=the probability of being in group 1 for each subject i=1,2, . . . , N1+N2. Since there are only 2 groups, the probability of being in group 2 equals 1−P(i). The maximum likelihood (ML) algorithm implemented in Latent GOLD 4.0 (Vermunt and Magidson, 2005) was used to estimate the model parameters. All 1-gene models were estimated first, followed by all 2-gene models and in cases where the sample sizes N1 and N2 were sufficiently large, all 3-gene models were estimated.
    • 3) Screen out models that fail to meet the statistical or clinical criteria: Regarding the statistical criteria, models were retained if the incremental p-values for the parameter estimates for each gene (i.e., for each predictor in the model) fell below the cutoff point alpha=0.05. Regarding the clinical criteria, models were retained if the percentage of cases within each group (e.g., disease group, and reference group (e.g., healthy, normal subjects) that was correctly predicted to be in that group was at least 75%. For technical details, see the section “Application of the Statistical and Clinical Criteria to Screen Models”.
    • 4) Each model yielded an index that could be used to rank the sample subjects. Such an index value could also be computed for new cases not included in the sample. See the section “Computing Model-based Indices for each Subject” for details on how this index was calculated.
    • 5) A cutoff value somewhere between the lowest and highest index value was selected and based on this cutoff, subjects with indices above the cutoff were classified (predicted to be) in the disease group, those below the cutoff were classified into the reference group (i.e., normal, healthy subjects). Based on such classifications, the percent of each group that is correctly classified was determined. See the section labeled “Classifying Subjects into Groups” for details on how the cutoff was chosen.
    • 6) Among all models that survived the screening criteria (Step 3), an entropy-based R2 statistic was used to rank the models from high to low, i.e., the models with the highest percent classification rate to the lowest percent classification rate. The top 5 such models are then evaluated with respect to the percent correctly classified and the one having the highest percentages was selected as the single “best” model. A discrimination plot was provided for the best model having an 85% or greater percent classification rate. For details on how this plot was developed, see the section “Discrimination Plots” below.


While there are several possible R2 statistics that might be used for this purpose, it was determined that the one based on entropy was most sensitive to the extent to which a model yields clear separation between the 2 groups. Such sensitivity provides a model which can be used as a tool by a practitioner (e.g., primary care physician, oncologist, etc.) to ascertain the necessity of future screening or treatment options. For more detail on this issue, see the section labeled “Using R2 Statistics to Rank Models” below.


Computing Model-Based Indices for Each Subject

The model parameter estimates were used to compute a numeric value (logit, odds or probability) for each diseased and reference subject (e.g., healthy, normal subject) in the sample. For illustrative purposes only, in an example of a 2-gene logit model for cancer containing the genes ALOX5 and S 100A6, the following parameter estimates listed in Table A were obtained:













TABLE A









Cancer
alpha(1)
18.37



Normals
alpha(2)
−18.37



Predictors



ALOX5
beta(1)
−4.81



S100A6
beta(2)
2.79











For a given subject with particular ΔCT values observed for these genes, the predicted logit associated with cancer vs. reference (i.e., normals) was computed as:





LOGIT (ALOX5, S100A6)=[alpha(1)−alpha(2)]+beta(1)*ALOX5+beta(2)*S100A6.


The predicted odds of having cancer would be:





ODDS(ALOX5, S100A6)=exp [LOGIT(ALOX5, S100A6)]


and the predicted probability of belonging to the cancer group is:





P(ALOX5, S100A6)=ODDS(ALOX5, S100A6)/[1+ODDS(ALOX5, S100A6)]


Note that the ML estimates for the alpha parameters were based on the relative proportion of the group sample sizes. Prior to computing the predicted probabilities, the alpha estimates may be adjusted to take into account the relative proportion in the population to which the model will be applied (for example, without limitation, the incidence of prostate cancer in the population of adult men in the U.S., the incidence of breast cancer in the population of adult women in the U.S., etc.)


Classifying Subjects into Groups


The “modal classification rule” was used to predict into which group a given case belongs. This rule classifies a case into the group for which the model yields the highest predicted probability. Using the same cancer example previously described (for illustrative purposes only), use of the modal classification rule would classify any subject having P>0.5 into the cancer group, the others into the reference group (e.g., healthy, normal subjects). The percentage of all N1 cancer subjects that were correctly classified were computed as the number of such subjects having P>0.5 divided by N1. Similarly, the percentage of all N2 reference (e.g., normal healthy) subjects that were correctly classified were computed as the number of such subjects having P≦5 0.5 divided by N2. Alternatively, a cutoff point P0 could be used instead of the modal classification rule so that any subject i having P(i)>P0 is assigned to the cancer group, and otherwise to the Reference group (e.g., normal, healthy group).


Application of the Statistical and Clinical Criteria to Screen Models
Clinical Screening Criteria

In order to determine whether a model met the clinical 75% correct classification criteria, the following approach was used:

    • A. All sample subjects were ranked from high to low by their predicted probability P (e.g., see Table B).
    • B. Taking P0(i)=P(i) for each subject, one at a time, the percentage of group 1 and group 2 that would be correctly classified, PI(i) and P2(i) was computed.
    • C. The information in the resulting table was scanned and any models for which none of the potential cutoff probabilities met the clinical criteria (i.e., no cutoffs P0(i) exist such that both P1(i)>0.75 and P2(i)>0.75) were eliminated. Hence, models that did not meet the clinical criteria were eliminated.


The example shown in Table B has many cut-offs that meet this criteria. For example, the cutoff P0=0.4 yields correct classification rates of 92% for the reference group (i.e., normal, healthy subjects), and 93% for Cancer subjects. A plot based on this cutoff is shown in FIG. 1 and described in the section “Discrimination Plots”.


Statistical Screening Criteria

In order to determine whether a model met the statistical criteria, the following approach was used to compute the incremental p-value for each gene g=1,2, . . . , G as follows:

    • i. Let LSQ(0) denote the overall model L-squared output by Latent GOLD for an unrestricted model.
    • ii. Let LSQ(g) denote the overall model L-squared output by Latent GOLD for the restricted version of the model where the effect of gene g is restricted to 0.
    • iii. With 1 degree of freedom, use a ‘components of chi-square’ table to determine the p-value associated with the LR difference statistic LSQ(g)−LSQ(0).


      Note that this approach required estimating g restricted models as well as 1 unrestricted model.


Discrimination Plots

For a 2-gene model, a discrimination plot consisted of plotting the ΔCT values for each subject in a scatterplot where the values associated with one of the genes served as the vertical axis, the other serving as the horizontal axis. Two different symbols were used for the points to denote whether the subject belongs to group 1 or 2.


A line was appended to a discrimination graph to illustrate how well the 2-gene model discriminated between the 2 groups. The slope of the line was determined by computing the ratio of the ML parameter estimate associated with the gene plotted along the horizontal axis divided by the corresponding estimate associated with the gene plotted along the vertical axis. The intercept of the line was determined as a function of the cutoff point. For the cancer example model based on the 2 genes ALOX5 and S100A6 shown in FIG. 1, the equation for the line associated with the cutoff of 0.4 is ALOX5=7.7+0.58*S100A6. This line provides correct classification rates of 93% and 92% (4 of 57 cancer subjects misclassified and only 4 of 50 reference (i.e., normal) subjects misclassified).


For a 3-gene model, a 2-dimensional slice defined as a linear combination of 2 of the genes was plotted along one of the axes, the remaining gene being plotted along the other axis. The particular linear combination was determined based on the parameter estimates. For example, if a 3rd gene were added to the 2-gene model consisting of ALOX5 and S100A6 and the parameter estimates for ALOX5 and S100A6 were beta(1) and beta(2) respectively, the linear combination beta(1)*ALOX5+beta(2)*S100A6 could be used. This approach can be readily extended to the situation with 4 or more genes in the model by taking additional linear combinations. For example, with 4 genes one might use beta(1)*ALOX5+beta(2)*S100A6 along one axis and beta(3)*gene3+beta(4)*gene4 along the other, or beta(1)*ALOX5+beta(2)*S100A6+beta(3)*gene3 along one axis and gene4 along the other axis. When producing such plots with 3 or more genes, genes with parameter estimates having the same sign were chosen for combination.


Using R2 Statistics to Rank Models

The R2 in traditional OLS (ordinary least squares) linear regression of a continuous dependent variable can be interpreted in several different ways, such as 1) proportion of variance accounted for, 2) the squared correlation between the observed and predicted values, and 3) a transformation of the F-statistic. When the dependent variable is not continuous but categorical (in our models the dependent variable is dichotomous—membership in the diseased group or reference group), this standard R2 defined in terms of variance (see definition 1 above) is only one of several possible measures. The term ‘pseudo R2’ has been coined for the generalization of the standard variance-based R2 for use with categorical dependent variables, as well as other settings where the usual assumptions that justify OLS do not apply.


The general definition of the (pseudo) R2 for an estimated model is the reduction of errors compared to the errors of a baseline model. For the purpose of the present invention, the estimated model is a logistic regression model for predicting group membership based on 1 or more continuous predictors (ΔCT measurements of different genes). The baseline model is the regression model that contains no predictors; that is, a model where the regression coefficients are restricted to 0. More precisely, the pseudo R2 is defined as:






R
2=[Error(baseline)−Error(model)]/Error(baseline)


Regardless how error is defined, if prediction is perfect, Error(model)=0 which yields R2=1. Similarly, if all of the regression coefficients do in fact turn out to equal 0, the model is equivalent to the baseline, and thus R2=0. In general, this pseudo R2 falls somewhere between 0 and 1.


When Error is defined in terms of variance, the pseudo R2 becomes the standard R2. When the dependent variable is dichotomous group membership, scores of 1 and 0, −1 and +1, or any other 2 numbers for the 2 categories yields the same value for R2. For example, if the dichotomous dependent variable takes on the scores of 1 and 0, the variance is defined as P*(1−P) where P is the probability of being in 1 group and 1−P the probability of being in the other.


A common alternative in the case of a dichotomous dependent variable, is to define error in terms of entropy. In this situation, entropy can be defined as P*ln(P)*(1−P)*ln(1−P) (for further discussion of the variance and the entropy based R2, see Magidson, Jay, “Qualitative Variance, Entropy and Correlation Ratios for Nominal Dependent Variables,” Social Science Research 10 (June), pp. 177-194).


The R2 statistic was used in the enumeration methods described herein to identify the “best” gene-model. R2 can be calculated in different ways depending upon how the error variation and total observed variation are defined. For example, four different R2 measures output by Latent GOLD are based on:

  • a) Standard variance and mean squared error (MSE)
  • b) Entropy and minus mean log-likelihood (−MLL)
  • c) Absolute variation and mean absolute error (MAE)
  • d) Prediction errors and the proportion of errors under modal assignment (PPE)


Each of these 4 measures equal 0 when the predictors provide zero discrimination between the groups, and equal 1 if the model is able to classify each subject into their actual group with 0 error. For each measure, Latent GOLD defines the total variation as the error of the baseline (intercept-only) model which restricts the effects of all predictors to 0. Then for each, R2 is defined as the proportional reduction of errors in the estimated model compared to the baseline model. For the 2-gene cancer example used to illustrate the enumeration methodology described herein, the baseline model classifies all cases as being in the diseased group since this group has a larger sample size, resulting in 50 misclassifications (all 50 normal subjects are misclassified) for a prediction error of 50/107=0.467. In contrast, there are only 10 prediction errors (=10/107=0.093) based on the 2-gene model using the modal assignment rule, thus yielding a prediction error R2 of 1−0.093/.467=0.8. As shown in Exhibit 1, 4 normal and 6 cancer subjects would be misclassified using the modal assignment rule. Note that the modal rule utilizes P0=0.5 as the cutoff. If P0=0.4 were used instead, there would be only 8 misclassified subjects.


The sample discrimination plot shown in FIG. 1 is for a 2-gene model for cancer based on disease-specific genes. The 2 genes in the model are ALOX5 and S100A6 and only 8 subjects are misclassified (4 blue circles corresponding to normal subjects fall to the right and below the line, while 4 red Xs corresponding to misclassified cancer subjects lie above the line).


To reduce the likelihood of obtaining models that capitalize on chance variations in the observed samples the models may be limited to contain only M genes as predictors in the model. (Although a model may meet the significance criteria, it may overfit data and thus would not be expected to validate when applied to a new sample of subjects.) For example, for M=2, all models would be estimated which contain:


A. 1-gene—G such models


B. 2-gene models—







(



G




2



)

=


G
*




(

G
-
1

)

2






such





models





C. 3-gene models—(G 3)=G*(G−1)*(G−2)/6 such models


Computation of the Z-Statistic

The Z-Statistic associated with the test of significance between the mean ΔCT values for the cancer and normal groups for any gene g was calculated as follows:

  • i. Let LL[g] denote the log of the likelihood function that is maximized under the logistic regression model that predicts group membership (Cancer vs. Normal) as a function of the ΔCT value associated with gene g. There are 2 parameters in this model—an intercept and a slope.
  • ii. Let LL(0) denote the overall model L-squared output by Latent GOLD for the restricted version of the model where the slope parameter reflecting the effect of gene g is restricted to 0. This model has only 1 unrestricted parameter—the intercept.
  • iii. With 2−1=1 degree of freedom (the difference in the number of unrestricted parameters in the models), one can use a ‘components of chi-square’ table to determine the p-value associated with the Log Likelihood difference statistic LLDiff=−2*(LL[0]=LL[g])=2*(LL[g]−LL[0]).
  • iv. Since the chi-squared statistic with 1 df is the square of a Z-statistic, the magnitude of the Z-statistic can be computed as the square root of the LLDiff. The sign of Z is negative if the mean ΔCT value for the cancer group on gene g is less than the corresponding mean for the normal group, and positive if it is greater.
  • v. These Z-statistics can be plotted as a bar graph. The length of the bar has a monotonic relationship with the p-value.









TABLE B







ΔCT Values and Model Predicted Probability of Cancer for Each Subject










ALOX5
S100A6
P
Group





13.92
16.13
1.0000
Cancer


13.90
15.77
1.0000
Cancer


13.75
15.17
1.0000
Cancer


13.62
14.51
1.0000
Cancer


15.33
17.16
1.0000
Cancer


13.86
14.61
1.0000
Cancer


14.14
15.09
1.0000
Cancer


13.49
13.60
0.9999
Cancer


15.24
16.61
0.9999
Cancer


14.03
14.45
0.9999
Cancer


14.98
16.05
0.9999
Cancer


13.95
14.25
0.9999
Cancer


14.09
14.13
0.9998
Cancer


15.01
15.69
0.9997
Cancer


14.13
14.15
0.9997
Cancer


14.37
14.43
0.9996
Cancer


14.14
13.88
0.9994
Cancer


14.33
14.17
0.9993
Cancer


14.97
15.06
0.9988
Cancer


14.59
14.30
0.9984
Cancer


14.45
13.93
0.9978
Cancer


14.40
13.77
0.9972
Cancer


14.72
14.31
0.9971
Cancer


14.81
14.38
0.9963
Cancer


14.54
13.91
0.9963
Cancer


14.88
14.48
0.9962
Cancer


14.85
14.42
0.9959
Cancer


15.40
15.30
0.9951
Cancer


15.58
15.60
0.9951
Cancer


14.82
14.28
0.9950
Cancer


14.78
14.06
0.9924
Cancer


14.68
13.88
0.9922
Cancer


14.54
13.64
0.9922
Cancer


15.86
15.91
0.9920
Cancer


15.71
15.60
0.9908
Cancer


16.24
16.36
0.9858
Cancer


16.09
15.94
0.9774
Cancer


15.26
14.41
0.9705
Cancer


14.93
13.81
0.9693
Cancer


15.44
14.67
0.9670
Cancer


15.69
15.08
0.9663
Cancer


15.40
14.54
0.9615
Cancer


15.80
15.21
0.9586
Cancer


15.98
15.43
0.9485
Cancer


15.20
14.08
0.9461
Normal


15.03
13.62
0.9196
Cancer


15.20
13.91
0.9184
Cancer


15.04
13.54
0.8972
Cancer


15.30
13.92
0.8774
Cancer


15.80
14.68
0.8404
Cancer


15.61
14.23
0.7939
Normal


15.89
14.64
0.7577
Normal


15.44
13.66
0.6445
Cancer


16.52
15.38
0.5343
Cancer


15.54
13.67
0.5255
Normal


15.28
13.11
0.4537
Cancer


15.96
14.23
0.4207
Cancer


15.96
14.20
0.3928
Normal


16.25
14.69
0.3887
Cancer


16.04
14.32
0.3874
Cancer


16.26
14.71
0.3863
Normal


15.97
14.18
0.3710
Cancer


15.93
14.06
0.3407
Normal


16.23
14.41
0.2378
Cancer


16.02
13.91
0.1743
Normal


15.99
13.78
0.1501
Normal


16.74
15.05
0.1389
Normal


16.66
14.90
0.1349
Normal


16.91
15.20
0.0994
Normal


16.47
14.31
0.0721
Normal


16.63
14.57
0.0672
Normal


16.25
13.90
0.0663
Normal


16.82
14.84
0.0596
Normal


16.75
14.73
0.0587
Normal


16.69
14.54
0.0474
Normal


17.13
15.25
0.0416
Normal


16.87
14.72
0.0329
Normal


16.35
13.76
0.0285
Normal


16.41
13.83
0.0255
Normal


16.68
14.20
0.0205
Normal


16.58
13.97
0.0169
Normal


16.66
14.09
0.0167
Normal


16.92
14.49
0.0140
Normal


16.93
14.51
0.0139
Normal


17.27
15.04
0.0123
Normal


16.45
13.60
0.0116
Normal


17.52
15.44
0.0110
Normal


17.12
14.46
0.0051
Normal


17.13
14.46
0.0048
Normal


16.78
13.86
0.0047
Normal


17.10
14.36
0.0041
Normal


16.75
13.69
0.0034
Normal


17.27
14.49
0.0027
Normal


17.07
14.08
0.0022
Normal


17.16
14.08
0.0014
Normal


17.50
14.41
0.0007
Normal


17.50
14.18
0.0004
Normal


17.45
14.02
0.0003
Normal


17.53
13.90
0.0001
Normal


18.21
15.06
0.0001
Normal


17.99
14.63
0.0001
Normal


17.73
14.05
0.0001
Normal


17.97
14.40
0.0001
Normal


17.98
14.35
0.0001
Normal


18.47
15.16
0.0001
Normal


18.28
14.59
0.0000
Normal


18.37
14.71
0.0000
Normal









Example 3
Precision Profile™ for Breast Cancer

Custom primers and probes were prepared for the targeted 99 genes shown in the Precision Profile™ for Breast Cancer (shown in Table 1), selected to be informative relative to biological state of breast cancer patients. Gene expression profiles for the 99 breast cancer specific genes were analyzed using the 49 RNA samples obtained from breast cancer subjects, and the 26 RNA samples obtained from normal female subjects, as described in Example 1.


Logistic regression models yielding the best discrimination between subjects diagnosed with breast cancer and normal subjects were generated using the enumeration and classification methodology described in Example 2. A listing of all 1, 2, and 3-gene logistic regression models capable of distinguishing between subjects diagnosed with breast cancer and normal subjects with at least 75% accuracy is shown in Table 1A, (read from left to right).


As shown in Table IA, the 1, 2, and 3-gene models are identified in the first three columns on the left side of Table 1A, ranked by their entropy R2 value (shown in column 4, ranked from high to low). The number of subjects correctly classified or misclassified by each 1, 2, or 3-gene model for each patient group (i.e., normal vs. breast cancer) is shown in columns 5-8. The percent normal subjects and percent breast cancer subjects correctly classified by the corresponding gene model is shown in columns 9 and 10. The incremental p-value for each first, second, and third gene in the 1, 2, or 3-gene model is shown in columns 11-13 (note p-values smaller than 1×10−17 are reported as ‘0’). The total number of RNA samples analyzed in each patient group (i.e., normals vs. breast cancer), after exclusion of missing values, is shown in columns 14 and 15. The values missing from the total sample number for normal and/or breast cancer subjects shown in columns 14 and 15 correspond to instances in which values were excluded from the logistic regression analysis due to reagent limitations and/or instances where replicates did not meet quality metrics.


For example, the “best” logistic regression model (defined as the model with the highest entropy R2 value, as described in Example 2) based on the 99 genes included in the Precision Profile™ for Breast Cancer is shown in the first row of Table 1A, read left to right. The first row of Table 1A lists a 3-gene model, CTSD, EGR1, and NCOA1, capable of classifying normal subjects with 92% accuracy, and breast cancer subjects with 89.8% accuracy. A total number of 25 normal and 49 breast cancer RNA samples were analyzed for this 3-gene model, after exclusion of missing values. As shown in Table 1A, this 3-gene model correctly classifies 23 of the normal subjects as being in the normal patient population, and misclassifies 2 of the normal subjects as being in the breast cancer patient population. This 3-gene model correctly classifies 44 of the breast cancer subjects as being in the breast cancer patient population, and misclassifies 5 of the breast cancer subjects as being in the normal patient population. The p-value for the 1st gene, CTSD, is 4.6E-07, the incremental p-value for the second gene, EGR1 is 6.8E-10, and the incremental p-value for the third gene in the 3-gene model, NCOA1, is 1.6E-05.


A discrimination plot of the 3-gene model, CTSD, EGR1, and NCOA1, is shown in FIG. 2. As shown in FIG. 2, the normal subjects are represented by circles, whereas the breast cancer subjects are represented by X's. The line appended to the discrimination graph in FIG. 2 illustrates how well the 3-gene model discriminates between the 2 groups. Values above and to the left of the line represent subjects predicted by the 3-gene model to be in the normal population. Values below and to the right of the line represent subjects predicted to be in the breast cancer population. As shown in FIG. 2, only 2 normal subjects (circles) and 4 breast cancer subjects (X's) are classified in the wrong patient population.


The following equations describe the discrimination line shown in FIG. 2:





CTSDEGR1=0.62726*CTSD−5.7179*EGR1





CTSDEGR1=6.925105+0.505701*NCOA1.


The formula for computing the intercept and slope parameters for the discrimination line as a function of the parameter estimates from the logit model and the cutoff point is given in Table C below. Subjects below and to the right of this discrimination line have a predicted probability of being in the diseased group higher than the cutoff probability of 0.208.












TABLE C







Class1





















Group







Intercept


Breast
53.7858


cutoff =
0.208


Normals
−53.7858


logit
−1.337023






(cutoff) =


Predictors


CTSD
−9.6226
−15.3405
0.627268
alpha =
6.925105


EGR1
−5.7179

0.372732
beta =
0.505701


NCOA1
7.7577









A ranking of the top 83 breast cancer specific genes for which gene expression profiles were obtained, from most to least significant, is shown in Table 1B. Table 1B summarizes the results of significance tests (Z-statistic and p-values) for the difference in the mean expression levels for normal subjects and subjects suffering from breast cancer. A negative Z-statistic means that the ΔCT for the breast cancer subjects is less than that of the normals (e.g., see EGR1), i.e., genes having a negative Z-statistic are up-regulated in breast cancer subjects as compared to normal subjects. A positive Z-statistic means that the ΔCT for the breast cancer subjects is higher than that of of the normals, i.e., genes with a positive Z-statistic are down-regulated in breast cancer subjects as compared to normal subjects. FIG. 3 shows a graphical representation of the Z-statistic for each of the 83 genes shown in Table 1B, indicating which genes are up-regulated and down-regulated in breast cancer subjects as compared to normal subjects.


The expression values (ΔCT) for the 3-gene model, CTSD, EGR1, and NCOA1, for each of the 49 breast cancer samples and 25 normal subject samples used in the analysis, and their predicted probability of having breast cancer, is shown in Table 1C. As shown in Table 1C, the predicted probability of a subject having breast cancer, based on the 3-gene model CTSD, EGR1, and NCOA1, is based on a scale of 0 to 1, “0” indicating no breast cancer (i.e., normal healthy subject), “1” indicating the subject has breast cancer. A graphical representation of the predicted probabilities of a subject having breast cancer (i.e., a breast cancer index), based on this three-gene model, is shown in FIG. 4. Such an index can be used as a tool by a practitioner (e.g., primary care physician, oncologist, etc.) for diagnosis of breast cancer and to ascertain the necessity of future screening or treatment options.


Example 4
Precision Profile™ for Inflammatory Response

Custom primers and probes were prepared for the targeted 72 genes shown in the Precision Profile™ for Inflammatory Response (shown in Table 2), selected to be informative relative to biological state of inflammation and cancer. Gene expression profiles for the 72 inflammatory response genes were analyzed using the 49 RNA samples obtained from breast cancer subjects, and the 26 RNA samples obtained from normal female subjects, as described in Example 1.


Logistic regression models yielding the best discrimination between subjects diagnosed with breast cancer and normal subjects were generated using the enumeration and classification methodology described in Example 2. A listing of all 1 and 2-gene logistic regression models capable of distinguishing between subjects diagnosed with breast cancer and normal subjects with at least 75% accuracy is shown in Table 2A, (read from left to right).


As shown in Table 2A, the 1 and 2-gene models are identified in the first two columns on the left side of Table 2A, ranked by their entropy R2 value (shown in column 3, ranked from high to low). The number of subjects correctly classified or misclassified by each 1 or 2-gene model for each patient group (i.e., normal vs. breast cancer) is shown in columns 4-7. The percent normal subjects and percent breast cancer subjects correctly classified by the corresponding gene model is shown in columns 8 and 9. The incremental p-value for each first and second gene in the 1 or 2-gene model is shown in columns 10-11 (note p-values smaller than 1×10−17 are reported as ‘0’). The total number of RNA samples analyzed in each patient group (i.e., normals vs. breast cancer) after exclusion of missing values, is shown in columns 12-13. The values missing from the total sample number for normal and/or breast cancer subjects shown in columns 12-13 correspond to instances in which values were excluded from the logistic regression analysis due to reagent limitations and/or instances where replicates did not meet quality metrics.


For example, the “best” logistic regression model (defined as the model with the highest entropy R2 value, as described in Example 2) based on the 72 genes included in the Precision Profile™ for Inflammatory Response is shown in the first row of Table 2A, read left to right. The first row of Table 2A lists a 2-gene model, CCR5 and EGR1, capable of classifying normal subjects with 80.8% accuracy, and breast cancer subjects with 81.6% accuracy. All 26 normal and 49 breast cancer RNA samples were analyzed for this 2-gene model, no values were excluded. As shown in Table 2A, this 2-gene model correctly classifies 21 of the normal subjects as being in the normal patient population, and misclassifies 5 of the normal subjects as being in the breast cancer patient population. This 2-gene model correctly classifies 40 of the breast cancer subjects as being in the breast cancer patient population, and misclassifies 9 of the breast cancer subjects as being in the normal patient population. The p-value for the 1st gene, CCR5, is 0.0059, the incremental p-value for the second gene, EGR1 is 1.1E-08.


A discrimination plot of the 2-gene model, CCR5 and EGR1, is shown in FIG. 5. As shown in FIG. 5, the normal subjects are represented by circles, whereas the breast cancer subjects are represented by X's. The line appended to the discrimination graph in FIG. 5 illustrates how well the 2-gene model discriminates between the 2 groups. Values to the right of the line represent subjects predicted by the 2-gene model to be in the normal population. Values to the left of the line represent subjects predicted to be in the breast cancer population. As shown in FIG. 5, 5 normal subjects (circles) and 7 breast cancer subjects (X's) are classified in the wrong patient population.


The following equation describes the discrimination line shown in FIG. 5:





CCR5=54.5151−2.00143*EGR1


The intercept (alpha) and slope (beta) of the discrimination line was computed as follows. A cutoff of 0.64635 was used to compute alpha (equals 0.603033 in logit units).


Subjects to the left of this discrimination line have a predicted probability of being in the diseased group higher than the cutoff probability of 0.64635.


The intercept C0=54.5151 was computed by taking the difference between the intercepts for the 2 groups [44.1153−(−44.1153)=88.2306] and subtracting the log-odds of the cutoff probability (.603033). This quantity was then multiplied by −1/X where X is the coefficient for CCR5 (−1.6074).


A ranking of the top 68 inflammatory response genes for which gene expression profiles were obtained, from most to least significant, is shown in Table 2B. Table 2B summarizes the results of significance tests (p-values) for the difference in the mean expression levels for normal subjects and subjects suffering from breast cancer.


The expression values (ΔCT) for the 2-gene model, CCR5 and EGR1, for each of the 49 breast cancer subjects and 26 normal subject samples used in the analysis, and their predicted probability of having breast cancer is shown in Table 2C. In Table 2C, the predicted probability of a subject having breast cancer, based on the 2-gene model CCR5 and EGR1, is based on a scale of 0 to 1, “0” indicating no breast cancer (i.e., normal healthy subject), “1” indicating the subject has breast cancer. This predicted probability can be used to create a breast cancer index based on the 2-gene model CCR5 and EGR1, that can be used as a tool by a practitioner (e.g., primary care physician, oncologist, etc.) for diagnosis of breast cancer and to ascertain the necessity of future screening or treatment options.


Example 5
Human Cancer General Precision Profile™

Custom primers and probes were prepared for the targeted 91 genes shown in the Human Cancer Precision Profile™ (shown in Table 3), selected to be informative relative to the biological condition of human cancer, including but not limited to ovarian, breast, cervical, prostate, lung, colon, and skin cancer. Gene expression profiles for these 91 genes were analyzed using the 49 RNA samples obtained from breast cancer subjects, and 22 of the RNA samples obtained from the normal female subjects, as described in Example 1.


Logistic regression models yielding the best discrimination between subjects diagnosed with breast cancer and normal subjects were generated using the enumeration and classification methodology described in Example 2. A listing of all 1 and 2-gene logistic regression models capable of distinguishing between subjects diagnosed with breast cancer and normal subjects with at least 75% accuracy is shown in Table 3A, (read from left to right).


As shown in Table 3A, the 1 and 2-gene models are identified in the first two columns on the left side of Table 3A, ranked by their entropy R2 value (shown in column 3, ranked from high to low). The number of subjects correctly classified or misclassified by each 1 or 2-gene model for each patient group (i.e., normal vs. breast cancer) is shown in columns 4-7. The percent normal subjects and percent breast cancer subjects correctly classified by the corresponding gene model is shown in columns 8 and 9. The incremental p-value for each first and second gene in the 1 or 2-gene model is shown in columns 10-11 (note p-values smaller than 1×10−17 are reported as ‘0’). The total number of RNA samples analyzed in each patient group (i.e., normals vs. breast cancer) after exclusion of missing values, is shown in columns 12 and 13. The values missing from the total sample number for normal and/or breast cancer subjects shown in columns 12-13 correspond to instances in which values were excluded from the logistic regression analysis due to reagent limitations and/or instances where replicates did not meet quality metrics.


For example, the “best” logistic regression model (defined as the model with the highest entropy R2 value, as described in Example 2) based on the 91 genes included in the Human Cancer General Precision Profile™ is shown in the first row of Table 3A, read left to right. The first row of Table 3A lists a 2-gene model, EGR1 and NME1, capable of classifying normal subjects with 90.9% accuracy, and breast cancer subjects with 89.8% accuracy. All 22 normal and 49 breast cancer RNA samples were analyzed for this 2-gene model, no values were excluded. As shown in Table 3A, this 2-gene model correctly classifies 20 of the normal subjects as being in the normal patient population, and misclassifies 2 of the normal subjects as being in the breast cancer patient population. This 2-gene model correctly classifies 44 of the breast cancer subjects as being in the breast cancer patient population, and misclassifies 5 of the breast cancer subjects as being in the normal patient population. The p-value for the gene, EGR1, is 4.0E-14, the incremental p-value for the second gene, NME1 is 0.0003.


A discrimination plot of the 2-gene model, EGR1 and NME1, is shown in FIG. 6. As shown in FIG. 6, the normal subjects are represented by circles, whereas the breast cancer subjects are represented by X's. The line appended to the discrimination graph in FIG. 6 illustrates how well the 2-gene model discriminates between the 2 groups. Values above the line represent subjects predicted by the 2-gene model to be in the normal population. Values below the line represent subjects predicted to be in the breast cancer population. As shown in FIG. 6, only 2 normal subjects (circles) and 5 breast cancer subjects (X's) are classified in the wrong patient population.


The following equation describes the discrimination line shown in FIG. 6:





EGR1=27.49988−0.40672*NME1


The intercept (alpha) and slope (beta) of the discrimination line was computed as follows. A cutoff of 0.67155 was used to compute alpha (equals 0.715204 in logit units).


Subjects below this discrimination line have a predicted probability of being in the diseased group higher than the cutoff probability of 0.67155.


The intercept C0=27.49988 was computed by taking the difference between the intercepts for the 2 groups [105.425−(−105.425)=210.85] and subtracting the log-odds of the cutoff probability (0.715204). This quantity was then multiplied by −1/X where X is the coefficient for EGR1 (−7.6413).


A ranking of the top 80 genes for which gene expression profiles were obtained, from most to least significant is shown in Table 3B. Table 3B summarizes the results of significance tests (p-values) for the difference in the mean expression levels for normal subjects and subjects suffering from breast cancer.


The expression values (ΔCT) for the 2-gene model, EGR1 and NME1, for each of the 49 breast cancer subjects and 22 normal subject samples used in the analysis, and their predicted probability of having breast cancer is shown in Table 3C. In Table 3C, the predicted probability of a subject having breast cancer, based on the 2-gene model EGR1 and NME1 is based on a scale of 0 to 1, “0” indicating no breast cancer (i.e., normal healthy subject), “1” indicating the subject has breast cancer. This predicted probability can be used to create a breast cancer index based on the 2-gene model EGR1 and NME1, that can be used as a tool by a practitioner (e.g., primary care physician, oncologist, etc.) for diagnosis of breast cancer and to ascertain the necessity of future screening or treatment options.


Example 6
EGR1 Precision Profile™

Custom primers and probes were prepared for the targeted 39 genes shown in the Precision Profile™ for EGR1 (shown in Table 4), selected to be informative of the biological role early growth response genes play in human cancer (including but not limited to ovarian, breast, cervical, prostate, lung, colon, and skin cancer). Gene expression profiles for these 39 genes were analyzed using 48 of the RNA samples obtained from breast cancer subjects, and 22 of the RNA samples obtained from normal female subjects, as described in Example 1.


Logistic regression models yielding the best discrimination between subjects diagnosed with breast cancer and normal subjects were generated using the enumeration and classification methodology described in Example 2. A listing of all 2-gene logistic regression models capable of distinguishing between subjects diagnosed with breast cancer and normal subjects with at least 75% accuracy is shown in Table 4A, (read from left to right).


As shown in Table 4A, the 2-gene models are identified in the first two columns on the left side of Table 4A, ranked by their entropy R2 value (shown in column 3, ranked from high to low). The number of subjects correctly classified or misclassified by each 2-gene model for each patient group (i.e., normal vs. breast cancer) is shown in columns 4-7. The percent normal subjects and percent breast cancer subjects correctly classified by the corresponding gene model is shown in columns 8 and 9. The incremental p-value for each first and second gene in the 2-gene model is shown in columns 10-11 (note p-values smaller than 1×10−17 are reported as ‘0’). The total number of RNA samples analyzed in each patient group (i.e., normals vs. breast cancer) after exclusion of missing values, is shown in columns 12 and 13. The values missing from the total sample number for normal and/or breast cancer subjects shown in columns 12-13 correspond to instances in which values were excluded from the logistic regression analysis due to reagent limitations and/or instances where replicates did not meet quality metrics.


For example, the “best” logistic regression model (defined as the model with the highest entropy R2 value, as described in Example 2) based on the 39 genes included in the Precision Profile™ for EGR1 is shown in the first row of Table 4A, read left to right. The first row of Table 4A lists a 2-gene model, NR4A2 and TGFB1, capable of classifying normal subjects with 81.8% accuracy, and breast cancer subjects with 85.4% accuracy. All 22 normal and 48 breast cancer RNA samples were analyzed for this 2-gene model, no values were excluded. As shown in Table 4A, this 2-gene model correctly classifies 18 of the normal subjects as being in the normal patient population, and misclassifies 4 of the normal subjects as being in the breast cancer patient population. This 2-gene model correctly classifies 41 of the breast cancer subjects as being in the breast cancer patient population, and misclassifies 7 of the breast cancer subjects as being in the normal patient population. The p-value for the 1st gene, NR4A2, is 4.7E-05, the incremental p-value for the second gene, TGFB 1 is 1.9E-09.


A ranking of the top 32 genes for which gene expression profiles were obtained, from most to least significant is shown in Table 4B. Table 4B summarizes the results of significance tests (p-values) for the difference in the mean expression levels for normal subjects and subjects suffering from breast cancer.


Example 7
Cross-Cancer Precision Profile™

Custom primers and probes were prepared for the targeted 110 genes shown in the Cross Cancer Precision Profile™ (shown in Table 5), selected to be informative relative to the biological condition of human cancer, including but not limited to ovarian, breast, cervical, prostate, lung, colon, and skin cancer. Gene expression profiles for these 110 genes were analyzed using 48 of the RNA samples obtained from breast cancer subjects, and 22 of the RNA samples obtained from normal female subjects, as described in Example 1.


Logistic regression models yielding the best discrimination between subjects diagnosed with breast cancer and normal subjects were generated using the enumeration and classification methodology described in Example 2. A listing of all 1 and 2-gene logistic regression models capable of distinguishing between subjects diagnosed with breast cancer and normal subjects with at least 75% accuracy is shown in Table 5A, (read from left to right).


As shown in Table 5A, the 1 and 2-gene models are identified in the first two columns on the left side of Table 5A, ranked by their entropy R2 value (shown in column 3, ranked from high to low). The number of subjects correctly classified or misclassified by each 1 or 2-gene model for each patient group (i.e., normal vs. breast cancer) is shown in columns 4-7. The percent normal subjects and percent breast cancer subjects correctly classified by the corresponding gene model is shown in columns 8 and 9. The incremental p-value for each first and second gene in the 1 or 2-gene model is shown in columns 10-11 (note p-values smaller than 1×10−17 are reported as ‘0’). The total number of RNA samples analyzed in each patient group (i.e., normals vs. breast cancer) after exclusion of missing values, is shown in columns 12 and 13. The values missing from the total sample number for normal and/or breast cancer subjects shown in columns 12-13 correspond to instances in which values were excluded from the logistic regression analysis due to reagent limitations and/or instances where replicates did not meet quality metrics.


For example, the “best” logistic regression model (defined as the model with the highest entropy R2 value, as described in Example 2) based on the 110 genes in the Human Cancer General Precision Profile™ is shown in the first row of Table 5A, read left to right. The first row of Table 5A lists a 2-gene model, EGR1 and PLEK2, capable of classifying normal subjects with 100% accuracy, and breast cancer subjects with 95.8% accuracy. Twenty of the 22 normal RNA samples and all 48 breast cancer RNA samples were used to analyze this 2-gene model after exclusion of missing values. As shown in Table 5A, this 2-gene model correctly classifies all 20 of the normal subjects as being in the normal patient population. This 2-gene model correctly classifies 46 of the breast cancer subjects as being in the breast cancer patient population, and misclassifies only 2 of the breast cancer subjects as being in the normal patient population. The p-value for the 1st gene, EGR1, is 1.9E-15, the incremental p-value for the second gene, PLEK2 is 4.1E-07.


A discrimination plot of the 2-gene model, EGR1 and PLEK2, is shown in FIG. 7. As shown in FIG. 7, the normal subjects are represented by circles, whereas the breast cancer subjects are represented by X's. The line appended to the discrimination graph in FIG. 7 illustrates how well the 2-gene model discriminates between the 2 groups. Values above the line represent subjects predicted by the 2-gene model to be in the normal population. Values below the line represent subjects predicted to be in the breast cancer population. As shown in FIG. 7, no normal subjects (circles) and only 2 breast cancer subjects (X's) are classified in the wrong patient population.


The following equation describes the discrimination line shown in FIG. 7:





EGR1=13.09928+0.357257*PLEK2


The intercept (alpha) and slope (beta) of the discrimination line was computed as follows. A cutoff of 0.8257 was used to compute alpha (equals 1.555454 in logit units).


Subjects below this discrimination line have a predicted probability of being in the diseased group higher than the cutoff probability of 0.8257.


The intercept C0=13.09928 was computed by taking the difference between the intercepts for the 2 groups [87.3083″(−87.3083)=174.6166] and subtracting the log-odds of the cutoff probability (1.555454). This quantity was then multiplied by −1/X where X is the coefficient for EGR1 (−13.2115).


A ranking of the top 107 genes for which gene expression profiles were obtained, from most to least significant is shown in Table 5B. Table 5B summarizes the results of significance tests (p-values) for the difference in the mean expression levels for normal subjects and subjects suffering from breast cancer.


The expression values (ΔCT) for the 2-gene model, EGR1 and PLEK2, for each of the 48 breast cancer subjects and 20 normal subject samples used in the analysis, and their predicted probability of having breast cancer is shown in Table 5C. In Table 5C, the predicted probability of a subject having breast cancer, based on the 2-gene model EGR1 and PLEK2 is based on a scale of 0 to 1, “0” indicating no breast cancer (i.e., normal healthy subject), “1” indicating the subject has breast cancer. This predicted probability can be used to create a breast cancer index based on the 2-gene model EGR1 and PLEK2, that can be used as a tool by a practitioner (e.g., primary care physician, oncologist, etc.) for diagnosis of breast cancer and to ascertain the necessity of future screening or treatment options.


These data support that Gene Expression Profiles with sufficient precision and calibration as described herein (1) can determine subsets of individuals with a known biological condition, particularly individuals with breast cancer or individuals with conditions related to breast cancer; (2) may be used to monitor the response of patients to therapy; (3) may be used to assess the efficacy and safety of therapy; and (4) may be used to guide the medical management of a patient by adjusting therapy to bring one or more relevant Gene Expression Profiles closer to a target set of values, which may be normative values or other desired or achievable values.


Gene Expression Profiles are used for characterization and monitoring of treatment efficacy of individuals with breast cancer, or individuals with conditions related to breast cancer. Use of the algorithmic and statistical approaches discussed above to achieve such identification and to discriminate in such fashion is within the scope of various embodiments herein. The references listed below are hereby incorporated herein by reference.


REFERENCES



  • Magidson, J. GOLDMineR User's Guide (1998). Belmont, Mass.: Statistical Innovations Inc.

  • Vermunt and Magidson (2005). Latent GOLD 4.0 Technical Guide, Belmont Mass.: Statistical Innovations.

  • Vermunt and Magidson (2007). LG-Syntax™ User's Guide: Manual for Latent GOLD® 4.5 Syntax Module, Belmont Mass.: Statistical Innovations.

  • Vermunt J. K. and J. Magidson. Latent Class Cluster Analysis in (2002) J. A. Hagenaars and A. L. McCutcheon (eds.), Applied Latent Class Analysis, 89-106. Cambridge: Cambridge University Press.

  • Magidson, J. “Maximum Likelihood Assessment of Clinical Trials Based on an Ordered Categorical Response.” (1996) Drug Information Journal, Maple Glen, Pa.: Drug Information Association, Vol. 30, No.1, pp 143-170.










TABLE 1







Precision Profile ™ for Breast Cancer









Gene

Gene Accession


Symbol
Gene Name
Number





ABCB1
ATP-binding cassette, sub-family B (MDR/TAP), member 1
NM_000927


ATBF1
AT-binding transcription factor 1
NM_006885


ATM
ataxia telangiectasia mutated (includes complementation groups A, C and
NM_138293



D)


BAX
BCL2-associated X protein
NM_138761


BCL2
B-cell CLL/lymphoma 2
NM_000633


BRCA1
breast cancer 1, early onset
NM_007294


BRCA2
breast cancer 2, early onset
NM_000059


C3
complement component 3
NM_000064


CASP8
caspase 8, apoptosis-related cysteine peptidase
NM_001228


CASP9
caspase 9, apoptosis-related cysteine peptidase
NM_001229


CCND1
cyclin D1 (PRAD1: parathyroid adenomatosis 1)
NM_053056


CCNE1
Cyclin E1
NM_001238


CDH1
cadherin 1, type 1, E-cadherin (epithelial)
NM_004360


CDK4
cyclin-dependent kinase 4
NM_000075


CDKN1A
cyclin-dependent kinase inhibitor 1A (p21, Cip1)
NM_000389


CDKN1B
cyclin-dependent kinase inhibitor 1B (p27)
NM_004064


CRABP2
cellular retinoic acid binding protein 2
NM_001878


CTNNB1
catenin (cadherin-associated protein), beta 1, 88 kDa
NM_001904


CTSB
cathepsin B
NM_001908


CTSD
cathepsin D (lysosomal aspartyl peptidase)
NM_001909


CXCL2
Chemokine (C—X—C Motif) Ligand 2
NM_002089


DLC1
deleted in liver cancer 1
NM_182643


EGFR
epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b)
NM_005228



oncogene homolog, avian)


EGR1
Early growth response-1
NM_001964


EIF4E
eukaryotic translation initiation factor 4E
NM_001968


ERBB2
V-erb-b2 erythroblastic leukemia viral oncogene homolog 2,
NM_004448



neuro/glioblastoma derived oncogene homolog (avian)


ESR1
estrogen receptor 1
NM_000125


ESR2
estrogen receptor 2 (ER beta)
NM_001437


FGF8
fibroblast growth factor 8 (androgen-induced)
NM_033163


FLT1
Fms-related tyrosine kinase 1 (vascular endothelial growth factor/vascular
NM_002019



permeability factor receptor)


FOS
v-fos FBJ murine osteosarcoma viral oncogene homolog
NM_005252


GADD45A
growth arrest and DNA-damage-inducible, alpha
NM_001924


GATA3
GATA binding protein 3
NM_001002295


GNB2L1
guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1
NM_006098


GRB7
growth factor receptor-bound protein 7
NM_005310


HPGD
hydroxyprostaglandin dehydrogenase 15-(NAD)
NM_000860


ICAM1
Intercellular adhesion molecule 1
NM_000201


IFITM3
interferon induced transmembrane protein 3 (1-8U)
NM_021034


IGF2
Putative insulin-like growth factor II associated protein
NM_000612


IGFBP5
insulin-like growth factor binding protein 5
NM_000599


IL8
interleukin 8
NM_000584


ILF2
interleukin enhancer binding factor 2, 45 kDa
NM_004515


ING1
inhibitor of growth family, member 1
NM_198219


ITGA6
integrin, alpha 6
NM_000210


ITGB3
integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61)
NM_000212


JUN
v-jun sarcoma virus 17 oncogene homolog (avian)
NM_002228


KISS1
KiSS-1 metastasis-suppressor
NM_002256


KRT19
keratin 19
NM_002276


LAMB2
laminin, beta 2 (laminin S)
NM_002292


MCM7
MCM7 minichromosome maintenance deficient 7 (S. cerevisiae)
NM_005916


MDM2
Mdm2, transformed 3T3 cell double minute 2, p53 binding protein
NM_002392



(mouse)


MET
met proto-oncogene (hepatocyte growth factor receptor)
NM_000245


MGMT
O-6-methylguanine-DNA methyltransferase
NM_002412


MKI67
antigen identified by monoclonal antibody Ki-67
NM_002417


MMP2
matrix metallopeptidase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa type IV
NM_004530



collagenase)


MMP9
matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa type IV
NM_004994



collagenase)


MTA1
metastasis associated 1
NM_004689


MUC1
mucin 1, cell surface associated
NM_002456


MYBL2
v-myb myeloblastosis viral oncogene homolog (avian)-like 2
NM_002466


MYC
v-myc myelocytomatosis viral oncogene homolog (avian)
NM_002467


MYCBP
c-myc binding protein
NM_012333


NCOA1
nuclear receptor coactivator 1
NM_003743


NFKB1
nuclear factor of kappa light polypeptide gene enhancer in B-cells 1
NM_003998



(p105)


NME1
non-metastatic cells 1, protein (NM23A) expressed in
NM_198175


NTRK3
neurotrophic tyrosine kinase, receptor, type 3
NM_001012338


PCNA
proliferating cell nuclear antigen
NM_002592


PGR
progesterone receptor
NM_000926


PI3
Proteinase Inhibitor 3 (Skin Derived)
NM_002638


PITRM1
pitrilysin metallopeptidase 1
NM_014889


PLAU
plasminogen activator, urokinase
NM_002658


PPARG
peroxisome proliferative activated receptor, gamma
NM_138712


PSMB5
proteasome (prosome, macropain) subunit, beta type, 5
NM_002797


PSMD1
proteasome (prosome, macropain) 26S subunit, non-ATPase, 1
NM_002807


PTGS2
prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and
NM_000963



cyclooxygenase)


RB1
retinoblastoma 1 (including osteosarcoma)
NM_000321


RBL2
retinoblastoma-like 2 (p130)
NM_005611


RP5-
invasion inhibitory protein 45
NM_001025374


1077B9.4


RPL13A
ribosomal protein L13a
NM_012423


RPS3
ribosomal protein S3
NM_001005


SCGB2A1
secretoglobin, family 2A, member 1
NM_002407


SLPI
secretory leukocyte peptidase inhibitor
NM_003064


TFF1
trefoil factor 1 (breast cancer, estrogen-inducible sequence expressed in)
NM_003225


TGFB1
transforming growth factor, beta 1 (Camurati-Engelmann disease)
NM_000660


TGFBR1
transforming growth factor, beta receptor I (activin A receptor type II-like
NM_004612



kinase, 53 kDa)


THBS1
thrombospondin 1
NM_003246


THBS2
thrombospondin 2
NM_003247


TIE1
tyrosine kinase with immunoglobulin-like and EGF-like domains 1
NM_005424


TIMP1
tissue inhibitor of metalloproteinase 1
NM_003254


TNF
tumor necrosis factor (TNF superfamily, member 2)
NM_000594


TOP2A
topoisomerase (DNA) II alpha 170 kDa
NM_001067


TP53
tumor protein p53 (Li-Fraumeni syndrome)
NM_000546


TSC22D3
TSC22 domain family, member 3
NM_198057


TSP50
testes-specific protease 50
NM_013270


UBE3A
ubiquitin protein ligase E3A (human papilloma virus E6-associated
NM_000462



protein, Angelman syndrome)


USP10
ubiquitin specific peptidase 10
NM_005153


USP9X
ubiquitin specific peptidase 9, X-linked
NM_001039590


VEGF
vascular endothelial growth factor
NM_003376


VEZF1
vascular endothelial zinc finger 1
NM_007146


VIM
vimentin
NM_003380
















TABLE 2







Precision Profile ™ for Inflammatory Response









Gene

Gene Accession


Symbol
Gene Name
Number





ADAM17
a disintegrin and metalloproteinase domain 17 (tumor necrosis factor,
NM_003183



alpha, converting enzyme)


ALOX5
arachidonate 5-lipoxygenase
NM_000698


APAF1
apoptotic Protease Activating Factor 1
NM_013229


C1QA
complement component 1, q subcomponent, alpha polypeptide
NM_015991


CASP1
caspase 1, apoptosis-related cysteine peptidase (interleukin 1, beta,
NM_033292



convertase)


CASP3
caspase 3, apoptosis-related cysteine peptidase
NM_004346


CCL3
chemokine (C-C motif) ligand 3
NM_002983


CCL5
chemokine (C-C motif) ligand 5
NM_002985


CCR3
chemokine (C-C motif) receptor 3
NM_001837


CCR5
chemokine (C-C motif) receptor 5
NM_000579


CD19
CD19 Antigen
NM_001770


CD4
CD4 antigen (p55)
NM_000616


CD86
CD86 antigen (CD28 antigen ligand 2, B7-2 antigen)
NM_006889


CD8A
CD8 antigen, alpha polypeptide
NM_001768


CSF2
colony stimulating factor 2 (granulocyte-macrophage)
NM_000758


CTLA4
cytotoxic T-lymphocyte-associated protein 4
NM_005214


CXCL1
chemokine (C—X—C motif) ligand 1 (melanoma growth stimulating
NM_001511



activity, alpha)


CXCL10
chemokine (C—X—C moif) ligand 10
NM_001565


CXCR3
chemokine (C—X—C motif) receptor 3
NM_001504


DPP4
Dipeptidylpeptidase 4
NM_001935


EGR1
early growth response-1
NM_001964


ELA2
elastase 2, neutrophil
NM_001972


GZMB
granzyme B (granzyme 2, cytotoxic T-lymphocyte-associated serine
NM_004131



esterase 1)


HLA-DRA
major histocompatibility complex, class II, DR alpha
NM_019111


HMGB1
high-mobility group box 1
NM_002128


HMOX1
heme oxygenase (decycling) 1
NM_002133


HSPA1A
heat shock protein 70
NM_005345


ICAM1
Intercellular adhesion molecule 1
NM_000201


IFI16
interferon inducible protein 16, gamma
NM_005531


IFNG
interferon gamma
NM_000619


IL10
interleukin 10
NM_000572


IL12B
interleukin 12 p40
NM_002187


IL15
Interleukin 15
NM_000585


IL18
interleukin 18
NM_001562


IL18BP
IL-18 Binding Protein
NM_005699


IL1B
interleukin 1, beta
NM_000576


IL1R1
interleukin 1 receptor, type I
NM_000877


IL1RN
interleukin 1 receptor antagonist
NM_173843


IL23A
interleukin 23, alpha subunit p19
NM_016584


IL32
interleukin 32
NM_001012631


IL5
interleukin 5 (colony-stimulating factor, eosinophil)
NM_000879


IL6
interleukin 6 (interferon, beta 2)
NM_000600


IL8
interleukin 8
NM_000584


IRF1
interferon regulatory factor 1
NM_002198


LTA
lymphotoxin alpha (TNF superfamily, member 1)
NM_000595


MAPK14
mitogen-activated protein kinase 14
NM_001315


MHC2TA
class II, major histocompatibility complex, transactivator
NM_000246


MIF
macrophage migration inhibitory factor (glycosylation-inhibiting factor)
NM_002415


MMP12
matrix metallopeptidase 12 (macrophage elastase)
NM_002426


MMP9
matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa type
NM_004994



IV collagenase)


MNDA
myeloid cell nuclear differentiation antigen
NM_002432


MYC
v-myc myelocytomatosis viral oncogene homolog (avian)
NM_002467


NFKB1
nuclear factor of kappa light polypeptide gene enhancer in B-cells 1
NM_003998



(p105)


PLA2G7
phospholipase A2, group VII (platelet-activating factor acetylhydrolase,
NM_005084



plasma)


PLAUR
plasminogen activator, urokinase receptor
NM_002659


PTGS2
prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and
NM_000963



cyclooxygenase)


PTPRC
protein tyrosine phosphatase, receptor type, C
NM_002838


SERPINA1
serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase,
NM_000295



antitrypsin), member 1


SERPINE1
serpin peptidase inhibitor, clade E (nexin, plasminogen activator
NM_000602



inhibitor type 1), member 1


SSI-3
suppressor of cytokine signaling 3
NM_003955


TGFB1
transforming growth factor, beta 1 (Camurati-Engelmann disease)
NM_000660


TIMP1
tissue inhibitor of metalloproteinase 1
NM_003254


TLR2
toll-like receptor 2
NM_003264


TLR4
toll-like receptor 4
NM_003266


TNF
tumor necrosis factor (TNF superfamily, member 2)
NM_000594


TNFRSF13B
tumor necrosis factor receptor superfamily, member 13B
NM_012452


TNFRSF1A
tumor necrosis factor receptor superfamily, member 1A
NM_001065


TNFSF5
CD40 ligand (TNF superfamily, member 5, hyper-IgM syndrome)
NM_000074


TNFSF6
Fas ligand (TNF superfamily, member 6)
NM_000639


TOSO
Fas apoptotic inhibitory molecule 3
NM_005449


TXNRD1
thioredoxin reductase
NM_003330


VEGF
vascular endothelial growth factor
NM_003376
















TABLE 3







Human Cancer General Precision Profile ™









Gene

Gene Accession


Symbol
Gene Name
Number





ABL1
v-abl Abelson murine leukemia viral oncogene homolog 1
NM_007313


ABL2
v-abl Abelson murine leukemia viral oncogene homolog 2 (arg, Abelson-
NM_007314



related gene)


AKT1
v-akt murine thymoma viral oncogene homolog 1
NM_005163


ANGPT1
angiopoietin 1
NM_001146


ANGPT2
angiopoietin 2
NM_001147


APAF1
Apoptotic Protease Activating Factor 1
NM_013229


ATM
ataxia telangiectasia mutated (includes complementation groups A, C and
NM_138293



D)


BAD
BCL2-antagonist of cell death
NM_004322


BAX
BCL2-associated X protein
NM_138761


BCL2
BCL2-antagonist of cell death
NM_004322


BRAF
v-raf murine sarcoma viral oncogene homolog B1
NM_004333


BRCA1
breast cancer 1, early onset
NM_007294


CASP8
caspase 8, apoptosis-related cysteine peptidase
NM_001228


CCNE1
Cyclin E1
NM_001238


CDC25A
cell division cycle 25A
NM_001789


CDK2
cyclin-dependent kinase 2
NM_001798


CDK4
cyclin-dependent kinase 4
NM_000075


CDK5
Cyclin-dependent kinase 5
NM_004935


CDKN1A
cyclin-dependent kinase inhibitor 1A (p21, Cip1)
NM_000389


CDKN2A
cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4)
NM_000077


CFLAR
CASP8 and FADD-like apoptosis regulator
NM_003879


COL18A1
collagen, type XVIII, alpha 1
NM_030582


E2F1
E2F transcription factor 1
NM_005225


EGFR
epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b)
NM_005228



oncogene homolog, avian)


EGR1
Early growth response-1
NM_001964


ERBB2
V-erb-b2 erythroblastic leukemia viral oncogene homolog 2,
NM_004448



neuro/glioblastoma derived oncogene homolog (avian)


FAS
Fas (TNF receptor superfamily, member 6)
NM_000043


FGFR2
fibroblast growth factor receptor 2 (bacteria-expressed kinase,
NM_000141



keratinocyte growth factor receptor, craniofacial dysostosis 1)


FOS
v-fos FBJ murine osteosarcoma viral oncogene homolog
NM_005252


GZMA
Granzyme A (granzyme 1, cytotoxic T-lymphocyte-associated serine
NM_006144



esterase 3)


HRAS
v-Ha-ras Harvey rat sarcoma viral oncogene homolog
NM_005343


ICAM1
Intercellular adhesion molecule 1
NM_000201


IFI6
interferon, alpha-inducible protein 6
NM_002038


IFITM1
interferon induced transmembrane protein 1 (9-27)
NM_003641


IFNG
interferon gamma
NM_000619


IGF1
insulin-like growth factor 1 (somatomedin C)
NM_000618


IGFBP3
insulin-like growth factor binding protein 3
NM_001013398


IL18
Interleukin 18
NM_001562


IL1B
Interleukin 1, beta
NM_000576


IL8
interleukin 8
NM_000584


ITGA1
integrin, alpha 1
NM_181501


ITGA3
integrin, alpha 3 (antigen CD49C, alpha 3 subunit of VLA-3 receptor)
NM_005501


ITGAE
integrin, alpha E (antigen CD103, human mucosal lymphocyte antigen 1;
NM_002208



alpha polypeptide)


ITGB1
integrin, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29
NM_002211



includes MDF2, MSK12)


JUN
v-jun sarcoma virus 17 oncogene homolog (avian)
NM_002228


KDR
kinase insert domain receptor (a type III receptor tyrosine kinase)
NM_002253


MCAM
melanoma cell adhesion molecule
NM_006500


MMP2
matrix metallopeptidase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa type IV
NM_004530



collagenase)


MMP9
matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa type IV
NM_004994



collagenase)


MSH2
mutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli)
NM_000251


MYC
v-myc myelocytomatosis viral oncogene homolog (avian)
NM_002467


MYCL1
v-myc myelocytomatosis viral oncogene homolog 1, lung carcinoma
NM_001033081



derived (avian)


NFKB1
nuclear factor of kappa light polypeptide gene enhancer in B-cells 1
NM_003998



(p105)


NME1
non-metastatic cells 1, protein (NM23A) expressed in
NM_198175


NME4
non-metastatic cells 4, protein expressed in
NM_005009


NOTCH2
Notch homolog 2
NM_024408


NOTCH4
Notch homolog 4 (Drosophila)
NM_004557


NRAS
neuroblastoma RAS viral (v-ras) oncogene homolog
NM_002524


PCNA
proliferating cell nuclear antigen
NM_002592


PDGFRA
platelet-derived growth factor receptor, alpha polypeptide
NM_006206


PLAU
plasminogen activator, urokinase
NM_002658


PLAUR
plasminogen activator, urokinase receptor
NM_002659


PTCH1
patched homolog 1 (Drosophila)
NM_000264


PTEN
phosphatase and tensin homolog (mutated in multiple advanced cancers 1)
NM_000314


RAF1
v-raf-1 murine leukemia viral oncogene homolog 1
NM_002880


RB1
retinoblastoma 1 (including osteosarcoma)
NM_000321


RHOA
ras homolog gene family, member A
NM_001664


RHOC
ras homolog gene family, member C
NM_175744


S100A4
S100 calcium binding protein A4
NM_002961


SEMA4D
sema domain, immunoglobulin domain (Ig), transmembrane domain (TM)
NM_006378



and short cytoplasmic domain, (semaphorin) 4D


SERPINB5
serpin peptidase inhibitor, clade B (ovalbumin), member 5
NM_002639


SERPINE1
serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor
NM_000602



type 1), member 1


SKI
v-ski sarcoma viral oncogene homolog (avian)
NM_003036


SKIL
SKI-like oncogene
NM_005414


SMAD4
SMAD family member 4
NM_005359


SOCS1
suppressor of cytokine signaling 1
NM_003745


SRC
v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian)
NM_198291


TERT
telomerase-reverse transcriptase
NM_003219


TGFB1
transforming growth factor, beta 1 (Camurati-Engelmann disease)
NM_000660


THBS1
thrombospondin 1
NM_003246


TIMP1
tissue inhibitor of metalloproteinase 1
NM_003254


TIMP3
Tissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy,
NM_000362



pseudoinflammatory)


TNF
tumor necrosis factor (TNF superfamily, member 2)
NM_000594


TNFRSF10A
tumor necrosis factor receptor superfamily, member 10a
NM_003844


TNFRSF10B
tumor necrosis factor receptor superfamily, member 10b
NM_003842


TNFRSF1A
tumor necrosis factor receptor superfamily, member 1A
NM_001065


TP53
tumor protein p53 (Li-Fraumeni syndrome)
NM_000546


VEGF
vascular endothelial growth factor
NM_003376


VHL
von Hippel-Lindau tumor suppressor
NM_000551


WNT1
wingless-type MMTV integration site family, member 1
NM_005430


WT1
Wilms tumor 1
NM_000378
















TABLE 4







Precision Profile ™ for EGR1









Gene

Gene Accession


Symbol
Gene Name
Number





ALOX5
arachidonate 5-lipoxygenase
NM_000698


APOA1
apolipoprotein A-I
NM_000039


CCND2
cyclin D2
NM_001759


CDKN2D
cyclin-dependent kinase inhibitor 2D (p19, inhibits CDK4)
NM_001800


CEBPB
CCAAT/enhancer binding protein (C/EBP), beta
NM_005194


CREBBP
CREB binding protein (Rubinstein-Taybi syndrome)
NM_004380


EGFR
epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b)
NM_005228



oncogene homolog, avian)


EGR1
early growth response 1
NM_001964


EGR2
early growth response 2 (Krox-20 homolog, Drosophila)
NM_000399


EGR3
early growth response 3
NM_004430


EGR4
early growth response 4
NM_001965


EP300
E1A binding protein p300
NM_001429


F3
coagulation factor III (thromboplastin, tissue factor)
NM_001993


FGF2
fibroblast growth factor 2 (basic)
NM_002006


FN1
fibronectin 1
NM_00212482


FOS
v-fos FBJ murine osteosarcoma viral oncogene homolog
NM_005252


ICAM1
Intercellular adhesion molecule 1
NM_000201


JUN
jun oncogene
NM_002228


MAP2K1
mitogen-activated protein kinase kinase 1
NM_002755


MAPK1
mitogen-activated protein kinase 1
NM_002745


NAB1
NGFI-A binding protein 1 (EGR1 binding protein 1)
NM_005966


NAB2
NGFI-A binding protein 2 (EGR1 binding protein 2)
NM_005967


NFATC2
nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 2
NM_173091


NFκB1
nuclear factor of kappa light polypeptide gene enhancer in B-cells 1
NM_003998



(p105)


NR4A2
nuclear receptor subfamily 4, group A, member 2
NM_006186


PDGFA
platelet-derived growth factor alpha polypeptide
NM_002607


PLAU
plasminogen activator, urokinase
NM_002658


PTEN
phosphatase and tensin homolog (mutated in multiple advanced cancers
NM_000314



1)


RAF1
v-raf-1 murine leukemia viral oncogene homolog 1
NM_002880


S100A6
S100 calcium binding protein A6
NM_014624


SERPINE1
serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor
NM_000302



type 1), member 1


SMAD3
SMAD, mothers against DPP homolog 3 (Drosophila)
NM_005902


SRC
v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian)
NM_198291


TGFB1
transforming growth factor, beta 1
NM_000660


THBS1
thrombospondin 1
NM_003246


TOPBP1
topoisomerase (DNA) II binding protein 1
NM_007027


TNFRSF6
Fas (TNF receptor superfamily, member 6)
NM_000043


TP53
tumor protein p53 (Li-Fraumeni syndrome)
NM_000546


WT1
Wilms tumor 1
NM_000378
















TABLE 5







Cross-Cancer Precision Profile ™











Gene Accession


Gene Symbol
Gene Name
Number





ACPP
acid phosphatase, prostate
NM_001099


ADAM17
a disintegrin and metalloproteinase domain 17 (tumor necrosis factor,
NM_003183



alpha, converting enzyme)


ANLN
anillin, actin binding protein (scraps homolog, Drosophila)
NM_018685


APC
adenomatosis polyposis coli
NM_000038


AXIN2
axin 2 (conductin, axil)
NM_004655


BAX
BCL2-associated X protein
NM_138761


BCAM
basal cell adhesion molecule (Lutheran blood group)
NM_005581


C1QA
complement component 1, q subcomponent, alpha polypeptide
NM_015991


C1QB
complement component 1, q subcomponent, B chain
NM_000491


CA4
carbonic anhydrase IV
NM_000717


CASP3
caspase 3, apoptosis-related cysteine peptidase
NM_004346


CASP9
caspase 9, apoptosis-related cysteine peptidase
NM_001229


CAV1
caveolin 1, caveolae protein, 22 kDa
NM_001753


CCL3
chemokine (C-C motif) ligand 3
NM_002983


CCL5
chemokine (C-C motif) ligand 5
NM_002985


CCR7
chemokine (C-C motif) receptor 7
NM_001838


CD40LG
CD40 ligand (TNF superfamily, member 5, hyper-IgM syndrome)
NM_000074


CD59
CD59 antigen p18-20
NM_000611


CD97
CD97 molecule
NM_078481


CDH1
cadherin 1, type 1, E-cadherin (epithelial)
NM_004360


CEACAM1
carcinoembryonic antigen-related cell adhesion molecule 1 (biliary
NM_001712



glycoprotein)


CNKSR2
connector enhancer of kinase suppressor of Ras 2
NM_014927


CTNNA1
catenin (cadherin-associated protein), alpha 1, 102 kDa
NM_001903


CTSD
cathepsin D (lysosomal aspartyl peptidase)
NM_001909


CXCL1
chemokine (C—X—C motif) ligand 1 (melanoma growth stimulating
NM_001511



activity, alpha)


DAD1
defender against cell death 1
NM_001344


DIABLO
diablo homolog (Drosophila)
NM_019887


DLC1
deleted in liver cancer 1
NM_182643


E2F1
E2F transcription factor 1
NM_005225


EGR1
early growth response-1
NM_001964


ELA2
elastase 2, neutrophil
NM_001972


ESR1
estrogen receptor 1
NM_000125


ESR2
estrogen receptor 2 (ER beta)
NM_001437


ETS2
v-ets erythroblastosis virus E26 oncogene homolog 2 (avian)
NM_005239


FOS
v-fos FBJ murine osteosarcoma viral oncogene homolog
NM_005252


G6PD
glucose-6-phosphate dehydrogenase
NM_000402


GADD45A
growth arrest and DNA-damage-inducible, alpha
NM_001924


GNB1
guanine nucleotide binding protein (G protein), beta polypeptide 1
NM_002074


GSK3B
glycogen synthase kinase 3 beta
NM_002093


HMGA1
high mobility group AT-hook 1
NM_145899


HMOX1
heme oxygenase (decycling) 1
NM_002133


HOXA10
homeobox A10
NM_018951


HSPA1A
heat shock protein 70
NM_005345


IFI16
interferon inducible protein 16, gamma
NM_005531


IGF2BP2
insulin-like growth factor 2 mRNA binding protein 2
NM_006548


IGFBP3
insulin-like growth factor binding protein 3
NM_001013398


IKBKE
inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase
NM_014002



epsilon


IL8
interleukin 8
NM_000584


ING2
inhibitor of growth family, member 2
NM_001564


IQGAP1
IQ motif containing GTPase activating protein 1
NM_003870


IRF1
interferon regulatory factor 1
NM_002198


ITGAL
integrin, alpha L (antigen CD11A (p180), lymphocyte function-
NM_002209



associated antigen 1; alpha polypeptide)


LARGE
like-glycosyltransferase
NM_004737


LGALS8
lectin, galactoside-binding, soluble, 8 (galectin 8)
NM_006499


LTA
lymphotoxin alpha (TNF superfamily, member 1)
NM_000595


MAPK14
mitogen-activated protein kinase 14
NM_001315


MCAM
melanoma cell adhesion molecule
NM_006500


MEIS1
Meis1, myeloid ecotropic viral integration site 1 homolog (mouse)
NM_002398


MLH1
mutL homolog 1, colon cancer, nonpolyposis type 2 (E. coli)
NM_000249


MME
membrane metallo-endopeptidase (neutral endopeptidase, enkephalinase,
NM_000902



CALLA, CD10)


MMP9
matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa type
NM_004994



IV collagenase)


MNDA
myeloid cell nuclear differentiation antigen
NM_002432


MSH2
mutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli)
NM_000251


MSH6
mutS homolog 6 (E. coli)
NM_000179


MTA1
metastasis associated 1
NM_004689


MTF1
metal-regulatory transcription factor 1
NM_005955


MYC
v-myc myelocytomatosis viral oncogene homolog (avian)
NM_002467


MYD88
myeloid differentiation primary response gene (88)
NM_002468


NBEA
neurobeachin
NM_015678


NCOA1
nuclear receptor coactivator 1
NM_003743


NEDD4L
neural precursor cell expressed, developmentally down-regulated 4-like
NM_015277


NRAS
neuroblastoma RAS viral (v-ras) oncogene homolog
NM_002524


NUDT4
nudix (nucleoside diphosphate linked moiety X)-type motif 4
NM_019094


PLAU
plasminogen activator, urokinase
NM_002658


PLEK2
pleckstrin 2
NM_016445


PLXDC2
plexin domain containing 2
NM_032812


PPARG
peroxisome proliferative activated receptor, gamma
NM_138712


PTEN
phosphatase and tensin homolog (mutated in multiple advanced cancers
NM_000314



1)


PTGS2
prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and
NM_000963



cyclooxygenase)


PTPRC
protein tyrosine phosphatase, receptor type, C
NM_002838


PTPRK
protein tyrosine phosphatase, receptor type, K
NM_002844


RBM5
RNA binding motif protein 5
NM_005778


RP5-
invasion inhibitory protein 45
NM_001025374


1077B9.4


S100A11
S100 calcium binding protein A11
NM_005620


S100A4
S100 calcium binding protein A4
NM_002961


SCGB2A1
secretoglobin, family 2A, member 1
NM_002407


SERPINA1
serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase,
NM_000295



antitrypsin), member 1


SERPINE1
serpin peptidase inhibitor, clade E (nexin, plasminogen activator
NM_000602



inhibitor type 1), member 1


SERPING1
serpin peptidase inhibitor, clade G (C1 inhibitor), member 1,
NM_000062



(angioedema, hereditary)


SIAH2
seven in absentia homolog 2 (Drosophila)
NM_005067


SLC43A1
solute carrier family 43, member
NM_003627


SP1
Sp1 transcription factor
NM_138473


SPARC
secreted protein, acidic, cysteine-rich (osteonectin)
NM_003118


SRF
serum response factor (c-fos serum response element-binding
NM_003131



transcription factor)


ST14
suppression of tumorigenicity 14 (colon carcinoma)
NM_021978


TEGT
testis enhanced gene transcript (BAX inhibitor 1)
NM_003217


TGFB1
transforming growth factor, beta 1 (Camurati-Engelmann disease)
NM_000660


TIMP1
tissue inhibitor of metalloproteinase 1
NM_003254


TLR2
toll-like receptor 2
NM_003264


TNF
tumor necrosis factor (TNF superfamily, member 2)
NM_000594


TNFRSF1A
tumor necrosis factor receptor superfamily, member 1A
NM_001065


TXNRD1
thioredoxin reductase
NM_003330


UBE2C
ubiquitin-conjugating enzyme E2C
NM_007019


USP7
ubiquitin specific peptidase 7 (herpes virus-associated)
NM_003470


VEGFA
vascular endothelial growth factor
NM_003376


VIM
vimentin
NM_003380


XK
X-linked Kx blood group (McLeod syndrome)
NM_021083


XRCC1
X-ray repair complementing defective repair in Chinese hamster cells 1
NM_006297


ZNF185
zinc finger protein 185 (LIM domain)
NM_007150


ZNF350
zinc finger protein 350
NM_021632
















TABLE 6





Precision Profile ™ for Immunotherapy


Gene Symbol

















ABL1



ABL2



ADAM17



ALOX5



CD19



CD4



CD40LG



CD86



CCR5



CTLA4



EGFR



ERBB2



HSPA1A



IFNG



IL12



IL15



IL23A



KIT



MUC1



MYC



PDGFRA



PTGS2



PTPRC



RAF1



TGFB1



TLR2



TNF



TNFRSF10B



TNFRSF13B



VEGF




























TABLE 1A













Normal
Breast






3-gene models and




N =
26
49



total used


2-gene models and
Entropy


# bc
# bc
Correct
Correct



(excludes missing)



















1-gene models
R-sq
# normal Correct
# normal FALSE
Correct
FALSE
Classification
Classification
p-val 1
p-val 2
p-val 3
# normals
# disease
























CTSD
EGR1
NCOA1
0.73
23
2
44
5
92.0%
89.8%
4.6E−07
6.8E−10
1.6E−05
25
49


EGR1
EIF4E
PI3
0.65
22
1
46
3
95.7%
93.9%
5.5E−14
0.0004
0.0105
23
49


CDKN1B
EGR1
NCOA1
0.64
23
2
45
4
92.0%
91.8%
2.9E−05
5.4E−14
0.0069
25
49


EGR1
EIF4E
MMP9
0.63
21
3
42
6
87.5%
87.5%
2.6E−13
3.6E−05
0.0100
24
48


ATBF1
EGR1
EIF4E
0.63
22
2
45
4
91.7%
91.8%
0.0128
1.8E−13
2.8E−05
24
49


EGR1
PI3
VIM
0.63
19
3
43
6
86.4%
87.8%
1.1E−10
0.0104
0.0011
22
49


ATBF1
EGR1
VIM
0.62
19
4
42
7
82.6%
85.7%
0.0033
2.6E−10
6.7E−05
23
49


EGR1
EIF4E
MYCBP
0.61
20
4
43
6
83.3%
87.8%
3.7E−13
6.1E−05
0.0286
24
49


CTSD
EGR1
MMP9
0.61
22
3
42
6
88.0%
87.5%
0.0002
1.0E−09
0.0054
25
48


EGR1
ILF2
PI3
0.60
21
2
45
4
91.3%
91.8%
1.1E−12
0.0030
0.0282
23
49


CDKN1B
EGR1
MMP9
0.60
22
3
43
5
88.0%
89.6%
0.0002
7.8E−13
0.0445
25
48


EGR1
MUC1
PI3
0.60
20
3
41
7
87.0%
85.4%
5.5E−12
0.0027
0.0170
23
48


BAX
EGR1
PITRM1
0.60
20
3
43
6
87.0%
87.8%
9.2E−05
2.8E−11
0.0415
23
49


EGR1
PCNA
PITRM1
0.60
22
2
45
4
91.7%
91.8%
3.7E−13
0.0001
0.0044
24
49


CCNE1
EGR1
RPS3
0.60
21
3
43
6
87.5%
87.8%
0.0155
5.6E−13
0.0001
24
49


CDKN1B
EGR1
PI3
0.60
20
3
43
6
87.0%
87.8%
0.0038
4.1E−13
0.0289
23
49


EGR1
MMP9
VIM
0.60
20
3
42
6
87.0%
87.5%
2.5E−10
0.0079
0.0002
23
48


EGR1
PI3
TNF
0.60
20
3
41
8
87.0%
83.7%
9.4E−09
0.0102
0.0042
23
49


ATBF1
CTSD
EGR1
0.60
22
2
44
5
91.7%
89.8%
0.0072
0.0001
1.6E−08
24
49


EGR1
PI3
PSMD1
0.59
21
2
42
4
91.3%
91.3%
5.6E−13
0.0133
0.0056
23
46


EGR1
IGF2
MUC1
0.59
23
1
42
5
95.8%
89.4%
8.3E−12
0.0118
0.0010
24
47


CTSD
EGR1
PI3
0.59
20
3
44
5
87.0%
89.8%
0.0057
2.9E−09
0.0195
23
49


EGR1
NCOA1
VIM
0.59
20
3
43
6
87.0%
87.8%
6.0E−10
0.0134
0.0003
23
49


EGR1
MUC1
RPS3
0.59
22
2
42
6
91.7%
87.5%
8.9E−12
0.0233
0.0157
24
48


CTSD
EGR1
PITRM1
0.59
21
3
43
6
87.5%
87.8%
0.0002
1.0E−08
0.0110
24
49


EGR1
NFKB1
PI3
0.59
20
3
43
6
87.0%
87.8%
3.2E−11
0.0068
0.0031
23
49


CDKN1A
CTSD
EGR1
0.59
22
3
41
7
88.0%
85.4%
0.0153
0.0005
1.8E−09
25
48


ATM
EGR1
MYCBP
0.59
22
3
43
6
88.0%
87.8%
0.0004
2.0E−13
0.0352
25
49


EGR1
MGMT
MUC1
0.59
22
2
42
6
91.7%
87.5%
1.4E−11
0.0184
0.0190
24
48


EGR1
PITRM1
RPS3
0.59
22
2
43
6
91.7%
87.8%
1.2E−12
0.0314
0.0002
24
49


ATM
EGR1
MUC1
0.59
20
4
41
7
83.3%
85.4%
0.0195
2.9E−11
0.0217
24
48


EGR1
MYCBP
TGFBR1
0.58
22
2
43
6
91.7%
87.8%
2.5E−13
0.0059
0.0003
24
49


CDKN1A
EGR1
MUC1
0.58
21
3
41
6
87.5%
87.2%
0.0180
2.0E−11
0.0004
24
47


EGR1
PI3
SLPI
0.58
18
5
43
6
78.3%
87.8%
2.1E−10
0.0014
0.0086
23
49


EGR1
PI3
TGFBR1
0.58
20
3
43
6
87.0%
87.8%
6.5E−13
0.0142
0.0114
23
49


EGR1
MYCBP
PSMD1
0.58
22
2
41
5
91.7%
89.1%
6.1E−13
0.0146
0.0004
24
46


EGR1
IGF2
VIM
0.58
21
2
43
5
91.3%
89.6%
1.1E−09
0.0246
0.0050
23
48


CTSD
EGR1
IFITM3
0.57
22
3
42
7
88.0%
85.7%
0.0007
5.8E−09
0.0393
25
49


EGR1
GNB2L1

0.57
22
3
44
5
88.0%
89.8%
3.4E−13
0.0007

25
49


EGR1
ILF2
ITGA6
0.57
20
4
42
7
83.3%
85.7%
9.0E−10
0.0006
0.0351
24
49


CRABP2
EGR1
NCOA1
0.57
22
3
43
5
88.0%
89.6%
0.0008
4.5E−12
0.0192
25
48


CTSD
EGR1
IGF2
0.57
22
3
42
6
88.0%
87.5%
0.0045
4.7E−09
0.0350
25
48


EGR1
MDM2
MYCBP
0.57
21
4
42
7
84.0%
85.7%
2.5E−13
0.0007
0.0245
25
49


CTSD
EGR1
LAMB2
0.57
22
2
43
6
91.7%
87.8%
0.0007
1.3E−08
0.0276
24
49


EGR1
PI3
TGFB1
0.57
20
3
41
7
87.0%
85.4%
1.6E−08
0.0087
0.0137
23
48


EGR1
IL8
MUC1
0.57
21
3
42
6
87.5%
87.5%
6.6E−11
0.0458
0.0039
24
48


ATBF1
EGR1
ILF2
0.57
21
3
44
5
87.5%
89.8%
0.0437
4.8E−12
0.0005
24
49


ABCB1
EGR1
MUC1
0.57
21
3
42
6
87.5%
87.5%
0.0466
1.8E−11
0.0096
24
48


EGR1
MUC1
NCOA1
0.57
21
3
40
8
87.5%
83.3%
2.3E−11
0.0007
0.0499
24
48


ABCB1
EGR1
VIM
0.57
20
3
43
6
87.0%
87.8%
0.0455
1.0E−09
0.0232
23
49


EGR1
PI3
RB1
0.57
19
4
42
7
82.6%
85.7%
8.8E−13
0.0113
0.0199
23
49


BAX
EGR1

0.57
20
4
43
6
83.3%
87.8%
0.0008
1.0E−10

24
49


EGR1
MDM2
PI3
0.57
20
3
43
6
87.0%
87.8%
9.9E−13
0.0217
0.0259
23
49


CDKN1B
EGR1

0.57
23
2
43
6
92.0%
87.8%
0.0010
1.1E−12

25
49


C3
EGR1
VIM
0.56
20
3
42
6
87.0%
87.5%
0.0446
3.8E−09
0.0008
23
48


EGR1
PITRM1
PSMD1
0.56
22
2
42
4
91.7%
91.3%
3.0E−12
0.0290
0.0007
24
46


EGR1
MDM2
NCOA1
0.56
22
3
43
6
88.0%
87.8%
1.3E−12
0.0015
0.0393
25
49


EGR1
MTA1
PI3
0.56
19
4
41
8
82.6%
83.7%
1.6E−09
0.0254
0.0149
23
49


EGR1
EIF4E

0.56
21
3
43
6
87.5%
87.8%
1.7E−12
0.0007

24
49


EGR1
IFITM3
TGFB1
0.56
22
2
43
5
91.7%
89.6%
1.3E−08
0.0087
0.0007
24
48


EGR1
NCOA1
TGFB1
0.56
21
3
41
7
87.5%
85.4%
2.0E−08
0.0088
0.0009
24
48


EGR1
MMP9
NFKB1
0.56
22
3
43
5
88.0%
89.6%
3.9E−11
0.0054
0.0017
25
48


EGR1
NCOA1
PSMD1
0.56
21
3
39
7
87.5%
84.8%
3.6E−12
0.0359
0.0011
24
46


CTNNB1
EGR1
PI3
0.56
20
3
43
6
87.0%
87.8%
0.0293
1.7E−12
0.0191
23
49


BRCA1
EGR1
NCOA1
0.56
22
3
43
6
88.0%
87.8%
0.0019
1.6E−12
0.0126
25
49


CTSB
EGR1
PI3
0.56
20
3
43
6
87.0%
87.8%
0.0304
5.7E−12
0.0108
23
49


CASP9
EGR1
PI3
0.56
19
4
42
7
82.6%
85.7%
0.0305
2.2E−11
0.0136
23
49


CASP9
EGR1
NCOA1
0.56
21
3
43
6
87.5%
87.8%
0.0012
2.0E−11
0.0055
24
49


BRCA1
EGR1
PI3
0.56
20
3
42
7
87.0%
85.7%
0.0314
1.3E−12
0.0161
23
49


EGR1
MYC
PI3
0.56
22
1
43
6
95.7%
87.8%
4.8E−11
0.0326
0.0171
23
49


EGR1
NCOA1
NFKB1
0.56
21
4
42
7
84.0%
85.7%
7.9E−11
0.0079
0.0021
25
49


CDKN1A
EGR1
PCNA
0.56
22
2
43
5
91.7%
89.6%
0.0401
1.2E−11
0.0015
24
48


EGR1
MMP9
TGFB1
0.55
20
4
40
7
83.3%
85.1%
1.4E−08
0.0102
0.0012
24
47


EGR1
MYCBP
RBL2
0.55
21
3
42
6
87.5%
87.5%
1.1E−12
0.0151
0.0010
24
48


EGR1
PI3
RBL2
0.55
20
3
42
6
87.0%
87.5%
2.1E−12
0.0290
0.0351
23
48


EGR1
NCOA1
TGFBR1
0.55
20
4
42
7
83.3%
85.7%
4.1E−12
0.0321
0.0017
24
49


EGR1
TGFB1
TSC22D3
0.55
21
3
41
7
87.5%
85.4%
1.5E−08
0.0019
0.0142
24
48


CDKN1A
EGR1
TGFB1
0.55
21
3
43
4
87.5%
91.5%
0.0154
2.5E−08
0.0019
24
47


EGR1
TGFB1
THBS1
0.55
22
2
44
4
91.7%
91.7%
4.9E−08
0.0015
0.0196
24
48


EGR1
ERBB2

0.54
21
3
43
6
87.5%
87.8%
3.2E−11
0.0027

24
49


ATM
EGR1

0.54
22
3
43
6
88.0%
87.8%
0.0037
1.4E−12

25
49


CCND1
EGR1

0.54
21
4
41
8
84.0%
83.7%
0.0038
2.7E−11

25
49


EGR1
MGMT

0.54
22
3
42
7
88.0%
85.7%
4.8E−12
0.0038

25
49


EGR1
IL8
TGFB1
0.54
21
3
42
6
87.5%
87.5%
3.6E−08
0.0305
0.0190
24
48


EGR1
RPS3

0.54
21
3
42
7
87.5%
85.7%
9.3E−12
0.0026

24
49


EGR1
MTA1
NCOA1
0.54
21
3
41
8
87.5%
83.7%
8.8E−09
0.0039
0.0321
24
49


EGR1
MCM7
PITRM1
0.54
21
3
44
5
87.5%
89.8%
1.1E−11
0.0029
0.0488
24
49


CDK4
EGR1
TGFB1
0.54
21
3
42
6
87.5%
87.5%
0.0342
1.8E−07
0.0404
24
48


EGR1
IGF2
TGFB1
0.54
21
3
42
5
87.5%
89.4%
6.4E−08
0.0276
0.0154
24
47


EGR1
NCOA1
USP9X
0.53
20
3
42
7
87.0%
85.7%
2.5E−10
0.0255
0.0060
23
49


CASP9
EGR1
MMP9
0.53
21
3
43
5
87.5%
89.6%
0.0047
6.8E−11
0.0185
24
48


CASP8
EGR1

0.53
21
4
43
6
84.0%
87.8%
0.0064
1.6E−12

25
49


CTSD
EGR1

0.53
21
4
41
8
84.0%
83.7%
0.0064
3.2E−08

25
49


EGR1
IGF2
TNF
0.53
21
3
43
5
87.5%
89.6%
4.2E−08
0.0313
0.0264
24
48


EGR1
MUC1

0.53
20
4
40
8
83.3%
83.3%
1.4E−10
0.0044

24
48


CDKN1A
EGR1
MTA1
0.53
21
3
43
5
87.5%
89.6%
0.0442
7.4E−09
0.0068
24
48


EGR1
ILF2

0.53
20
4
41
8
83.3%
83.7%
2.7E−11
0.0046

24
49


EGR1
ING1
MMP9
0.52
22
3
43
5
88.0%
89.6%
3.4E−10
0.0109
0.0249
25
48


EGR1
VIM

0.52
19
4
42
7
82.6%
85.7%
7.8E−09
0.0069

23
49


CTNNB1
EGR1
NCOA1
0.52
20
4
42
7
83.3%
85.7%
0.0073
1.4E−11
0.0466
24
49


EGR1
RPL13A

0.52
23
1
42
7
95.8%
85.7%
7.4E−11
0.0056

24
49


EGR1
NME1

0.52
22
3
43
6
88.0%
87.8%
2.7E−12
0.0103

25
49


EGR1
MDM2

0.52
22
3
43
6
88.0%
87.8%
3.1E−12
0.0113

25
49


CTNNB1
EGR1
MMP9
0.52
20
4
41
7
83.3%
85.4%
0.0090
3.1E−11
0.0490
24
48


EGR1
ING1
NCOA1
0.52
22
3
42
7
88.0%
85.7%
7.2E−10
0.0169
0.0467
25
49


CRABP2
EGR1

0.51
22
3
42
6
88.0%
87.5%
0.0112
5.6E−11

25
48


EGR1
NCOA1
VEZF1
0.51
21
3
41
6
87.5%
87.2%
3.0E−10
0.0159
0.0075
24
47


EGR1
PCNA

0.51
21
3
43
6
87.5%
87.8%
1.7E−11
0.0085

24
49


EGR1
PSMD1

0.51
20
4
38
8
83.3%
82.6%
1.3E−11
0.0088

24
46


ABCB1
EGR1

0.51
21
4
42
7
84.0%
85.7%
0.0233
7.2E−12

25
49


CDK4
EGR1

0.51
22
3
42
7
88.0%
85.7%
0.0238
9.1E−11

25
49


EGR1
MCM7

0.50
22
3
43
6
88.0%
87.8%
2.0E−11
0.0244

25
49


EGR1
PSMB5

0.50
20
4
41
8
83.3%
83.7%
1.4E−11
0.0142

24
49


EGR1
TGFBR1

0.50
20
4
41
8
83.3%
83.7%
1.0E−11
0.0151

24
49


BRCA2
EGR1

0.50
21
4
41
8
84.0%
83.7%
0.0324
7.8E−12

25
49


EGR1
MMP9
VEZF1
0.50
21
3
41
6
87.5%
87.2%
4.9E−10
0.0362
0.0199
24
47


BCL2
EGR1

0.50
21
4
41
7
84.0%
85.4%
0.0312
3.1E−11

25
48


EGR1
MYC

0.49
21
4
42
7
84.0%
85.7%
5.6E−10
0.0437

25
49


EGR1
IL8

0.49
21
4
41
8
84.0%
83.7%
2.8E−11
0.0448

25
49


BRCA1
EGR1

0.49
22
3
41
8
88.0%
83.7%
0.0450
9.2E−12

25
49


CDH1
EGR1

0.49
20
5
42
7
80.0%
85.7%
0.0486
8.9E−12

25
49


EGR1
RBL2

0.49
20
4
40
8
83.3%
83.3%
2.2E−11
0.0243

24
48


EGR1
TGFB1

0.49
20
4
41
7
83.3%
85.4%
2.8E−07
0.0280

24
48


EGR1
MTA1

0.49
19
5
40
9
79.2%
81.6%
4.0E−08
0.0371

24
49


EGR1
TNF

0.48
20
4
41
8
83.3%
83.7%
2.1E−07
0.0430

24
49


CTNNB1
EGR1

0.48
20
4
40
9
83.3%
81.6%
0.0498
4.5E−11

24
49


EGR1


0.45
20
5
40
9
80.0%
81.6%
6.4E−11


25
49


MTA1
TP53
VIM
0.40
16
5
38
11
76.2%
77.6%
0.0001
7.9E−05
0.0071
21
49


CTSD
MYCBP
NCOA1
0.40
21
4
42
7
84.0%
85.7%
7.1E−09
0.0078
0.0072
25
49


CTSD
FOS
NCOA1
0.40
22
3
38
11
88.0%
77.6%
1.5E−07
0.0084
3.6E−05
25
49


CTSD
MYCBP
TNF
0.40
20
4
40
9
83.3%
81.6%
0.0003
0.0001
0.0043
24
49


CTSD
HPGD
NCOA1
0.40
17
5
40
8
77.3%
83.3%
1.8E−07
0.0018
0.0020
22
48


CTSD
ITGA6
NCOA1
0.39
20
4
39
10
83.3%
79.6%
1.4E−08
0.0061
0.0008
24
49


MTA1
MYCBP
VIM
0.38
18
5
39
10
78.3%
79.6%
0.0010
7.1E−05
0.0023
23
49


ERBB2
ITGA6
MTA1
0.38
19
4
39
10
82.6%
79.6%
0.0342
8.6E−06
3.1E−06
23
49


ITGA6
MTA1
RPS3
0.38
18
6
39
10
75.0%
79.6%
7.7E−06
7.2E−08
0.0295
24
49


CCNE1
CTSD
NCOA1
0.37
20
5
39
10
80.0%
79.6%
0.0279
1.1E−08
0.0002
25
49


CTSD
HPGD
LAMB2
0.37
18
4
39
9
81.8%
81.3%
1.8E−07
6.8E−05
0.0054
22
48


ATM
CTSD
NCOA1
0.37
20
5
39
10
80.0%
79.6%
0.0280
1.8E−08
0.0003
25
49


ITGA6
MTA1
NCOA1
0.37
20
4
41
8
83.3%
83.7%
2.0E−05
3.1E−08
0.0351
24
49


CTSD
MTA1
MYCBP
0.37
18
6
38
11
75.0%
77.6%
0.0009
0.0160
0.0001003
24
49


CTSD
NCOA1
SLPI
0.37
20
5
38
11
80.0%
77.6%
1.1E−07
0.0001
0.0342
25
49


ATBF1
CTSD
HPGD
0.37
19
3
40
8
86.4%
83.3%
0.0073
6.0E−07
0.0013
22
48


BCL2
ITGA6
TNF
0.37
19
5
38
10
79.2%
79.2%
0.0023
7.4E−05
9.1E−07
24
48


CCND1
ITGA6
TNF
0.37
18
6
38
11
75.0%
77.6%
0.0021
0.0002
4.5E−06
24
49


TGFB1
TP53
VIM
0.36
17
4
39
9
81.0%
81.3%
0.0017
0.0081
0.0026
21
48


BAX
ITGA6
TNF
0.36
18
5
39
10
78.3%
79.6%
0.0023
0.0003
5.5E−05
23
49


CTSD
MYCBP
RPL13A
0.36
19
5
39
10
79.2%
79.6%
4.0E−07
0.0001
0.0269
24
49


CTSD
NCOA1
RB1
0.36
19
5
39
10
79.2%
79.6%
3.3E−08
0.0010
0.0300
24
49


CTSD
TGFB1
USP10
0.36
18
6
38
10
75.0%
79.2%
0.0079
0.0029
0.0018
24
48


ITGA6
RPL13A
TNF
0.36
20
4
42
7
83.3%
85.7%
0.0001
0.0031
5.9E−06
24
49


CTSD
NCOA1
TOP2A
0.36
19
5
39
10
79.2%
79.6%
4.3E−08
0.0003
0.0349
24
49


CTSD
ITGA6
MYC
0.35
20
4
40
9
83.3%
81.6%
9.2E−05
0.0002
0.0053
24
49


MYCBP
TNF
VIM
0.35
18
5
38
11
78.3%
77.6%
0.0009
0.0036
0.0082
23
49


ITGA6
MYC
TNF
0.35
20
4
41
8
83.3%
83.7%
0.0002
0.0041
9.9E−05
24
49


CTNNB1
CTSD
NCOA1
0.35
20
4
40
9
83.3%
81.6%
0.0467
4.4E−08
0.0009
24
49


CDK4
MYCBP
TGFB1
0.35
20
4
37
11
83.3%
77.1%
0.0012
0.0012
3.1E−06
24
48


CTSD
NCOA1
RBL2
0.35
19
5
39
9
79.2%
81.3%
4.6E−08
0.0005
0.0370
24
48


BAX
MYCBP
TNF
0.35
20
3
40
9
87.0%
81.6%
0.0021
0.0006
1.7E−05
23
49


CDK4
ITGA6
TGFB1
0.35
20
4
40
8
83.3%
83.3%
0.0026
0.0014
8.3E−06
24
48


ERBB2
ITGA6
TNF
0.34
18
5
38
11
78.3%
77.6%
0.0062
0.0003
1.8E−05
23
49


MYC
TP53
VIM
0.34
17
4
38
11
81.0%
77.6%
0.0018
0.0002
1.1E−05
21
49


RB1
TNF
VIM
0.34
18
5
38
11
78.3%
77.6%
0.0016
0.0059
0.0059
23
49


TGFB1
THBS1
USP10
0.34
19
5
37
11
79.2%
77.1%
2.2E−07
0.0190
0.0009
24
48


ITGA6
MYC
VIM
0.34
18
5
38
11
78.3%
77.6%
7.4E−05
0.0017
9.8E−05
23
49


CTSD
RB1
TNF
0.34
19
5
40
9
79.2%
81.6%
0.0043
0.0020
0.0029
24
49


HPGD
TGFB1
VIM
0.34
16
5
36
11
76.2%
76.6%
0.0011
0.0191
0.0053
21
47


CTSD
ITGA6
TNF
0.34
19
5
40
9
79.2%
81.6%
0.0090
0.0021
0.0128
24
49


TGFB1
TNF
USP10
0.34
18
6
37
11
75.0%
77.1%
0.0006
0.0237
0.0020
24
48


CTSD
PTGS2
SLPI
0.34
19
5
40
9
79.2%
81.6%
1.1E−05
0.0040
0.0057
24
49


ITGA6
RPS3
TNF
0.33
21
3
41
8
87.5%
83.7%
0.0005
0.0112
5.8E−07
24
49


CTSD
HPGD
PITRM1
0.33
17
5
38
10
77.3%
79.2%
3.2E−06
0.0009
0.0409
22
48


ATBF1
HPGD
VIM
0.33
16
5
37
11
76.2%
77.1%
0.0123
9.3E−05
4.0E−06
21
48


ILF2
ITGA6
MYC
0.33
18
6
37
12
75.0%
75.5%
0.0003
1.9E−06
9.0E−05
24
49


BAX
HPGD
SLPI
0.33
16
5
37
11
76.2%
77.1%
6.7E−05
4.6E−05
0.0004
21
48


BAX
ITGA6
MYC
0.33
18
5
38
11
78.3%
77.6%
0.0005
2.6E−05
0.0002
23
49


ITGB3
TGFB1
USP10
0.33
19
5
39
9
79.2%
81.3%
0.0339
1.6E−07
0.0010
24
48


CCND1
CTSD
ITGA6
0.33
19
5
37
12
79.2%
75.5%
0.0187
2.5E−05
0.0006
24
49


CDKN1B
MYCBP
TNF
0.33
19
5
39
10
79.2%
79.6%
0.0097
0.0005
2.2E−07
24
49


ING1
MYCBP
VIM
0.33
19
4
40
9
82.6%
81.6%
0.0120
1.0E−04
0.0005
23
49


CTSD
ITGA6
RPL13A
0.33
19
5
39
10
79.2%
79.6%
2.5E−05
0.0006
0.0203
24
49


CRABP2
TGFB1
USP10
0.33
19
5
37
10
79.2%
78.7%
0.0462
3.1E−06
0.0022
24
47


C3
CTSD
HPGD
0.33
17
5
36
11
77.3%
76.6%
0.0441
9.9E−07
0.0022
22
47


CTSD
PTGS2
TGFB1
0.33
19
4
38
10
82.6%
79.2%
0.0100
0.0386
0.0414
23
48


CRABP2
MYCBP
TNF
0.33
18
6
36
12
75.0%
75.0%
0.0096
0.0009
1.6E−06
24
48


ITGA6
MTA1

0.33
18
6
37
12
75.0%
75.5%
8.5E−05
4.4E−08

24
49


GNB2L1
ITGA6
TNF
0.33
18
6
38
11
75.0%
77.6%
0.0172
0.0004
4.6E−07
24
49


ITGA6
TGFB1
VIM
0.32
18
5
38
10
78.3%
79.2%
0.0085
0.0078
0.0064
23
48


CTSD
NCOA1

0.32
20
5
39
10
80.0%
79.6%
1.4E−07
0.0009

25
49


CDK4
TGFB1
TP53
0.32
17
5
37
11
77.3%
77.1%
0.0117
2.6E−06
0.0063
22
48


ITGA6
RPL13A
TGFB1
0.32
18
6
36
12
75.0%
75.0%
0.0020
0.0098
4.5E−05
24
48


ITGA6
TNF
USP9X
0.32
18
5
37
12
78.3%
75.5%
0.0021
5.4E−05
0.0319
23
49


EIF4E
RB1
TNF
0.32
19
5
39
10
79.2%
79.6%
0.0100
0.0005
5.4E−07
24
49


CTSD
MYCBP

0.32
20
5
38
11
80.0%
77.6%
3.5E−08
0.0009

25
49


FOS
MTA1
MYCBP
0.32
18
6
39
10
75.0%
79.6%
0.0112
1.7E−05
0.0011
24
49


PTGS2
SLPI
TGFB1
0.32
18
5
38
10
78.3%
79.2%
0.0091
0.0150
1.6E−05
23
48


BAX
RB1
TNF
0.32
18
5
38
11
78.3%
77.6%
0.0089
0.0027
3.2E−05
23
49


CDK4
ITGA6
TNF
0.32
18
6
39
10
75.0%
79.6%
0.0260
0.0010
1.4E−05
24
49


ITGA6
TNF
VIM
0.32
18
5
38
11
78.3%
77.6%
0.0056
0.0055
0.0404
23
49


CTSD
MYC
TP53
0.32
18
5
38
11
78.3%
77.6%
2.8E−05
0.0093
0.0018
23
49


ILF2
ITGA6
TGFB1
0.32
18
6
36
12
75.0%
75.0%
0.0135
0.0010
0.0004
24
48


ITGA6
TGFB1
TNF
0.32
19
5
38
10
79.2%
79.2%
0.0060
0.0225
0.0138
24
48


ATBF1
CTSD
TNF
0.32
18
6
37
12
75.0%
75.5%
0.0066
0.0009
0.0140
24
49


MYCBP
TNF
USP9X
0.32
18
5
37
12
78.3%
75.5%
0.0030
2.5E−05
0.0258
23
49


BCL2
ITGA6
TGFB1
0.32
20
4
37
10
83.3%
78.7%
0.0157
0.0014
1.6E−05
24
47


CCND1
TGFB1
TP53
0.32
17
5
37
11
77.3%
77.1%
0.0173
1.6E−05
0.0099
22
48


ERBB2
ITGA6
TGFB1
0.31
18
5
37
11
78.3%
77.1%
0.0109
0.0027
0.0003
23
48


MYCBP
TGFB1
TNF
0.31
19
5
37
11
79.2%
77.1%
0.0066
0.0179
0.0081
24
48


ITGA6
RPS3
TGFB1
0.31
18
6
36
12
75.0%
75.0%
0.0033
0.0169
2.1E−06
24
48


CASP8
MYCBP
TNF
0.31
18
6
37
12
75.0%
75.5%
0.0260
0.0009
8.7E−08
24
49


CASP8
RB1
TNF
0.31
20
4
40
9
83.3%
81.6%
0.0177
0.0009
9.1E−08
24
49


ING1
MYCBP
TNF
0.31
19
5
39
10
79.2%
79.6%
0.0270
0.0011
0.0008
24
49


CCND1
ITGA6
TGFB1
0.31
18
6
36
12
75.0%
75.0%
0.0179
0.0030
6.1E−05
24
48


MDM2
MYCBP
TNF
0.31
18
6
37
12
75.0%
75.5%
0.0279
0.0009
1.1E−07
24
49


CTSD
FOS
PTGS2
0.31
19
5
39
10
79.2%
79.6%
0.0001
0.0241
0.0082
24
49


MTA1
MYCBP
TNF
0.31
19
5
37
12
79.2%
75.5%
0.0289
0.0019
0.0209
24
49


CTNNB1
CTSD
TNF
0.31
18
6
37
12
75.0%
75.5%
0.0092
0.0028
0.0084
24
49


CTSD
MTA1
TP53
0.31
17
5
38
11
77.3%
77.6%
0.0308
0.0076
0.0027
22
49


PI3
SLPI
TNF
0.31
18
5
38
11
78.3%
77.6%
0.0028
0.0088
7.8E−05
23
49


MTA1
PITRM1
TP53
0.31
17
5
38
11
77.3%
77.6%
4.1E−07
0.0332
0.0328
22
49


MYCBP
NFKB1
TNF
0.31
18
6
38
11
75.0%
77.6%
0.0011
0.0320
0.0017
24
49


TNF
TP53
VIM
0.31
17
4
40
9
81.0%
81.6%
0.0096
0.0216
0.0162
21
49


FOS
MYCBP
TNF
0.31
19
5
38
11
79.2%
77.6%
0.0333
0.0034
3.4E−05
24
49


ERBB2
MYCBP
TNF
0.31
19
4
38
11
82.6%
77.6%
0.0190
0.0018
9.9E−06
23
49


CCND1
MTA1
TP53
0.31
17
5
38
11
77.3%
77.6%
0.0380
2.2E−05
0.0005
22
49


CRABP2
ITGA6
TNF
0.31
18
6
36
12
75.0%
75.0%
0.0391
0.0025
5.2E−06
24
48


C3
CTSD
TNF
0.30
20
4
37
11
83.3%
77.1%
0.0178
0.0068
0.0230
24
48


EIF4E
HPGD
TGFB1
0.30
18
4
38
9
81.8%
80.9%
0.0339
0.0007
0.0001
22
47


CRABP2
RB1
TNF
0.30
19
5
38
10
79.2%
79.2%
0.0219
0.0028
3.1E−06
24
48


ATBF1
RB1
VIM
0.30
18
5
37
12
78.3%
75.5%
0.0467
0.0007
2.1E−06
23
49


BAX
MYCBP
NFKB1
0.30
19
5
38
11
79.2%
77.6%
0.0014
0.0001
0.0002
24
49


BAX
FOS
MYCBP
0.30
19
5
40
9
79.2%
81.6%
2.6E−05
0.0002
0.0018
24
49


ITGA6
MYC
TGFB1
0.30
18
6
36
12
75.0%
75.0%
0.0026
0.0292
0.0012
24
48


FOS
MTA1
PTGS2
0.30
19
4
37
12
82.6%
75.5%
0.0011
0.0002
0.0094
23
49


CDKN1B
RB1
TNF
0.30
19
5
39
10
79.2%
79.6%
0.0320
0.0022
8.2E−07
24
49


FOS
MUC1
PTGS2
0.30
19
4
39
9
82.6%
81.3%
1.8E−05
0.0003
0.0069
23
48


BAX
TNF
UBE3A
0.30
18
5
38
11
78.3%
77.6%
0.0055
0.0003
0.0068
23
49


PI3
SLPI
TGFB1
0.30
18
5
39
9
78.3%
81.3%
0.0015
0.0054
0.0001
23
48


EIF4E
ITGA6
TGFB1
0.30
18
6
36
12
75.0%
75.0%
0.0320
0.0025
4.3E−06
24
48


CTSD
PITRM1
TNF
0.30
18
6
37
12
75.0%
75.5%
0.0032
0.0152
0.0131
24
49


BAX
TGFB1
TP53
0.30
16
5
37
11
76.2%
77.1%
0.0239
0.0005
0.0228
21
48


FOS
PTGS2
VIM
0.30
17
5
38
11
77.3%
77.6%
0.0055
0.0092
0.0003
22
49


ERBB2
TGFB1
TP53
0.30
16
5
38
10
76.2%
79.2%
0.0246
6.1E−05
0.0141
21
48


BAX
MTA1
TP53
0.30
16
5
38
11
76.2%
77.6%
0.0466
0.0001
0.0008
21
49


ERBB2
ITGA6
MYC
0.30
18
5
38
11
78.3%
77.6%
0.0030
3.2E−05
0.0002
23
49


ERBB2
MTA1
TP53
0.30
16
5
37
12
76.2%
75.5%
0.0493
1.4E−05
0.0007
21
49


BAX
HPGD
TNF
0.30
16
5
37
11
76.2%
77.1%
0.0264
0.0029
0.0022
21
48


ITGA6
MUC1
MYC
0.29
18
6
36
12
75.0%
75.0%
8.2E−05
0.0023
7.8E−05
24
48


ITGA6
NFKB1
RPL13A
0.29
18
6
37
12
75.0%
75.5%
4.4E−05
0.0001
0.0019
24
49


BAX
MYC
TP53
0.29
17
5
37
12
77.3%
75.5%
0.0001
0.0001
0.0001
22
49


BAX
TGFBR1
TNF
0.29
19
4
39
10
82.6%
79.6%
0.0037
0.0093
0.0004
23
49


CDK4
RB1
TNF
0.29
18
6
37
12
75.0%
75.5%
0.0461
0.0035
8.9E−06
24
49


MTA1
MYCBP
NFKB1
0.29
18
6
38
11
75.0%
77.6%
0.0035
0.0004
0.0499
24
49


CDK4
MYCBP
NFKB1
0.29
19
6
37
12
76.0%
75.5%
0.0042
3.3E−05
9.5E−06
25
49


CTSD
MTA1
PITRM1
0.29
21
3
39
10
87.5%
79.6%
0.0052
0.0187
0.0050
24
49


CRABP2
MYCBP
TGFB1
0.29
18
6
36
11
75.0%
76.6%
0.0231
0.0133
3.1E−05
24
47


RPS3
TNF
UBE3A
0.29
18
6
37
12
75.0%
75.5%
0.0123
2.2E−05
0.0042
24
49


CCND1
ITGA6
NFKB1
0.29
19
5
39
10
79.2%
79.6%
0.0025
9.0E−05
0.0002
24
49


ING1
TP53
VIM
0.29
16
5
38
11
76.2%
77.6%
0.0247
0.0023
0.0033
21
49


ATM
BAX
TNF
0.29
18
5
37
12
78.3%
75.5%
0.0131
0.0038
0.0033
23
49


EIF4E
MYCBP
TGFB1
0.29
18
6
36
12
75.0%
75.0%
0.0332
0.0047
2.7E−06
24
48


CRABP2
ITGA6
MYC
0.29
19
5
39
9
79.2%
81.3%
0.0031
9.5E−05
1.3E−05
24
48


FOS
HPGD
MUC1
0.29
18
4
38
9
81.8%
80.9%
0.0007
0.0011
0.0012
22
47


ERBB2
HPGD
TNF
0.29
17
4
36
12
81.0%
75.0%
0.0426
0.0026
0.0020
21
48


MYCBP
RPS3
TGFB1
0.28
18
6
36
12
75.0%
75.0%
0.0135
0.0384
2.8E−06
24
48


TGFB1
USP10

0.28
19
5
38
10
79.2%
79.2%
1.2E−06
0.0056

24
48


MTA1
VEZF1
VIM
0.28
18
5
37
10
78.3%
78.7%
0.0006
0.0068
0.0151
23
47


MYCBP
NFKB1
RPL13A
0.28
19
5
37
12
79.2%
75.5%
8.3E−05
1.9E−05
0.0066
24
49


BAX
ITGA6
NFKB1
0.28
18
5
37
12
78.3%
75.5%
0.0032
0.0003
0.0030
23
49


BAX
ILF2
TNF
0.28
18
5
38
11
78.3%
77.6%
0.0050
0.0182
8.5E−05
23
49


CTSD
PI3
SLPI
0.28
18
5
37
12
78.3%
75.5%
0.0003
0.0062
0.0082
23
49


CCND1
ITGA6
VIM
0.28
18
5
38
11
78.3%
77.6%
0.0411
0.0010
0.0002
23
49


CTNNB1
CTSD
MTA1
0.28
18
6
38
11
75.0%
77.6%
0.0109
0.0049
0.0449
24
49


FOS
HPGD
RPL13A
0.28
17
5
36
12
77.3%
75.0%
0.0004
0.0007
0.0011
22
48


ITGA6
MYC
USP9X
0.28
18
5
38
11
78.3%
77.6%
0.0002
0.0005
0.0031
23
49


ITGA6
MCM7
MYC
0.28
19
5
38
11
79.2%
77.6%
3.7E−05
0.0045
8.7E−06
24
49


FOS
HPGD
MTA1
0.27
17
5
37
11
77.3%
77.1%
0.0173
0.0039
0.0012
22
48


FOS
MYCBP
RPL13A
0.27
18
6
37
12
75.0%
75.5%
2.9E−05
0.0026
0.0002
24
49


ATM
CTSD
LAMB2
0.27
18
6
37
12
75.0%
75.5%
0.0283
1.4E−06
0.0288
24
49


CTSD
LAMB2
TGFBR1
0.27
18
6
38
11
75.0%
77.6%
1.3E−06
0.0402
0.0307
24
49


CTSD
ITGA6

0.27
19
5
39
10
79.2%
79.6%
6.3E−07
0.0080

24
49


ERBB2
FOS
HPGD
0.27
16
5
37
11
76.2%
77.1%
0.0019
0.0044
0.0004
21
48


BAX
NME1
SLPI
0.27
18
6
37
12
75.0%
75.5%
1.8E−05
0.0041
0.0084
24
49


ILF2
ING1
ITGA6
0.27
18
6
37
12
75.0%
75.5%
0.0113
0.0023
9.9E−05
24
49


CDKN1B
TGFBR1
TNF
0.27
18
6
37
12
75.0%
75.5%
0.0172
0.0116
1.6E−05
24
49


FOS
ITGA6
RPL13A
0.27
19
5
39
10
79.2%
79.6%
0.0005
0.0034
0.0002
24
49


MTA1
USP10
VIM
0.27
18
5
37
12
78.3%
75.5%
0.0029
0.0202
0.0456
23
49


BRCA1
CTSD
LAMB2
0.27
18
6
37
12
75.0%
75.5%
0.0416
1.8E−06
0.0488
24
49


ILF2
ITGA6
NFKB1
0.26
18
6
37
12
75.0%
75.5%
0.0091
9.0E−05
0.0028
24
49


BAX
DLC1
HPGD
0.26
16
4
38
10
80.0%
79.2%
0.0013
0.0115
0.0007
20
48


RPS3
TGFB1
UBE3A
0.26
19
5
38
10
79.2%
79.2%
0.0214
0.0001
0.0431
24
48


RPS3
TGFB1
TGFBR1
0.26
19
5
38
10
79.2%
79.2%
0.0303
1.6E−05
0.0432
24
48


MTA1
TGFB1
VEZF1
0.26
18
6
35
11
75.0%
76.1%
0.0176
0.0135
0.0247
24
46


CRABP2
ING1
MYCBP
0.26
19
6
36
12
76.0%
75.0%
0.0210
3.3E−05
0.0004
25
48


MTA1
PITRM1
TNF
0.26
18
6
37
12
75.0%
75.5%
0.0238
0.0241
0.0278
24
49


BAX
SLPI
TP53
0.26
18
4
38
11
81.8%
77.6%
0.0003
0.0007
0.0273
22
49


FOS
TNF
USP10
0.26
19
5
37
12
79.2%
75.5%
0.0323
0.0002
0.0394
24
49


CDK4
PITRM1
TNF
0.26
18
6
38
11
75.0%
77.6%
0.0248
0.0199
2.2E−05
24
49


BAX
FOS
IL8
0.26
18
6
37
12
75.0%
75.5%
0.0004
0.0031
0.0160
24
49


ITGA6
MYC
PCNA
0.26
19
5
39
10
79.2%
79.6%
7.5E−05
1.2E−05
0.0116
24
49


ITGA6
NFKB1
RPS3
0.26
18
6
37
12
75.0%
75.5%
0.0002
2.2E−05
0.0123
24
49


ING1
ITGA6
MYC
0.26
18
6
37
12
75.0%
75.5%
0.0117
0.0002
0.0192
24
49


BAX
FOS
TGFBR1
0.26
18
5
38
11
78.3%
77.6%
0.0002
0.0023
0.0173
23
49


FOS
PTGS2
RPL13A
0.26
18
5
38
11
78.3%
77.6%
2.7E−05
0.0109
0.0018
23
49


MTA1
TP53

0.26
17
5
38
11
77.3%
77.6%
3.0E−06
0.0040

22
49


ITGA6
MYC
NFKB1
0.26
19
5
38
11
79.2%
77.6%
0.0002
0.0135
0.0127
24
49


BCL2
ITGA6
NFKB1
0.26
19
5
37
11
79.2%
77.1%
0.0143
0.0002
0.0002
24
48


ATBF1
MTA1
VIM
0.26
18
5
37
12
78.3%
75.5%
0.0330
0.0072
0.0205
23
49


ERBB2
ING1
ITGA6
0.26
18
5
38
11
78.3%
77.6%
0.0259
0.0011
0.0003
23
49


BAX
PSMD1
TNF
0.26
18
5
35
11
78.3%
76.1%
0.0099
0.0401
0.0007
23
46


MUC1
MYC
TP53
0.26
18
4
38
10
81.8%
79.2%
0.0012
0.0002
0.0010
22
48


ING1
ITGA6
MUC1
0.26
20
4
38
10
83.3%
79.2%
0.0005
0.0008
0.0308
24
48


BAX
FOS
NME1
0.25
19
5
39
10
79.2%
79.6%
0.0002
0.0180
0.0212
24
49


MTA1
PITRM1
RP51077B9.4
0.25
19
5
39
10
79.2%
79.6%
0.0004
0.0075
0.0410
24
49


ITGA6
MYCBP
NFKB1
0.25
19
5
37
12
79.2%
75.5%
0.0288
0.0159
1.8E−06
24
49


MTA1
MYCBP

0.25
18
6
38
11
75.0%
77.6%
1.4E−06
0.0034

24
49


FOS
HPGD
PCNA
0.25
18
4
36
12
81.8%
75.0%
0.0007
0.0004
0.0037
22
48


BAX
ITGA6
RP51077B9.4
0.25
18
5
38
11
78.3%
77.6%
0.0006
0.0088
0.0129
23
49


FOS
MTA1
PITRM1
0.25
18
6
37
12
75.0%
75.5%
0.0490
0.0003
0.0437
24
49


BAX
ING1
MYCBP
0.25
18
6
37
12
75.0%
75.5%
0.0198
0.0028
0.0014
24
49


CTNNB1
ITGA6
MYC
0.25
18
6
37
12
75.0%
75.5%
0.0178
9.1E−05
7.4E−06
24
49


ATM
CDK4
TNF
0.25
18
6
37
12
75.0%
75.5%
0.0341
0.0319
0.0020
24
49


ITGA6
TGFB1

0.25
18
6
36
12
75.0%
75.0%
0.0298
1.8E−06

24
48


CRABP2
TGFBR1
TNF
0.25
18
6
36
12
75.0%
75.0%
0.0406
0.0456
7.7E−05
24
48


CCND1
FOS
HPGD
0.25
17
5
38
10
77.3%
79.2%
0.0043
0.0031
0.0022
22
48


ERBB2
HPGD
NFKB1
0.25
16
5
37
11
76.2%
77.1%
0.0340
0.0002
0.0116
21
48


CDK4
FOS
MYCBP
0.25
19
6
37
12
76.0%
75.5%
0.0004
9.0E−05
0.0026
25
49


ILF2
RPL13A
TNF
0.25
18
6
37
12
75.0%
75.5%
0.0389
0.0320
3.9E−05
24
49


CDK4
ING1
MYCBP
0.25
19
6
37
12
76.0%
75.5%
0.0366
9.3E−05
0.0004
25
49


CCND1
FOS
PTGS2
0.25
18
6
37
12
75.0%
75.5%
0.0028
6.7E−05
0.0070
24
49


HPGD
ITGA6
NFKB1
0.25
18
4
36
12
81.8%
75.0%
0.0101
0.0435
4.5E−05
22
48


CDKN1B
HPGD
RP51077B9.4
0.24
17
5
37
11
77.3%
77.1%
0.0073
0.0014
0.0009
22
48


BAX
IFITM3
MYCBP
0.24
19
5
37
12
79.2%
75.5%
5.2E−05
0.0037
0.0026
24
49


ITGA6
MYC
RP51077B9.4
0.24
19
5
38
11
79.2%
77.6%
0.0030
0.0009
0.0248
24
49


CCND1
ING1
TP53
0.24
18
5
37
12
78.3%
75.5%
0.0119
0.0001
0.0015
23
49


FOS
MUC1
MYCBP
0.24
18
6
36
12
75.0%
75.0%
0.0001
0.0011
0.0200
24
48


CTSD
RB1

0.24
18
6
37
12
75.0%
75.5%
2.1E−06
0.0350

24
49


ITGA6
MYC
TIMP1
0.24
18
6
37
12
75.0%
75.5%
0.0008
0.0012
0.0266
24
49


CCND1
ICAM1
ITGA6
0.24
18
6
37
12
75.0%
75.5%
0.0009
0.0021
0.0014
24
49


ITGA6
MUC1
NFKB1
0.24
20
4
39
9
83.3%
81.3%
0.0010
0.0283
0.0011
24
48


HPGD
MDM2
SLPI
0.24
17
5
38
10
77.3%
79.2%
7.8E−05
0.0034
0.0011
22
48


BAX
NFKB1
RBL2
0.24
18
5
37
11
78.3%
77.1%
0.0007
0.0061
0.0015
23
48


BCL2
ING1
ITGA6
0.24
19
5
38
10
79.2%
79.2%
0.0444
0.0004
0.0004
24
48


BAX
ICAM1
MYCBP
0.24
18
6
37
12
75.0%
75.5%
0.0016
0.0050
0.0029
24
49


BAX
MYC
NME1
0.24
18
6
37
12
75.0%
75.5%
0.0003
0.0415
0.0029
24
49


BAX
GNB2L1
SLPI
0.24
18
6
37
12
75.0%
75.5%
0.0001
0.0199
0.0161
24
49


C3
CTSB
MTA1
0.24
19
5
37
11
79.2%
77.1%
0.0101
0.0449
1.1E−05
24
48


ITGA6
MGMT
NFKB1
0.24
18
6
39
10
75.0%
79.6%
0.0005
0.0384
0.0001
24
49


BAX
FOS
TP53
0.24
17
5
37
12
77.3%
75.5%
0.0006
0.0021
0.0432
22
49


CDK4
PITRM1
RP51077B9.4
0.24
19
5
39
10
79.2%
79.6%
0.0008
0.0040
6.9E−05
24
49


FOS
ILF2
ITGA6
0.24
19
5
37
12
79.2%
75.5%
0.0116
0.0008
0.0018
24
49


CDKN1B
ITGA6
MYC
0.24
18
6
37
12
75.0%
75.5%
0.0377
0.0002
4.8E−05
24
49


ATM
BAX
VIM
0.24
17
5
38
11
77.3%
77.6%
0.0066
0.0397
0.0248
22
49


FOS
ITGA6
MUC1
0.23
19
5
38
10
79.2%
79.2%
0.0015
0.0323
0.0011
24
48


FOS
MYC
PTGS2
0.23
18
6
37
12
75.0%
75.5%
0.0003
0.0055
0.0066
24
49


FOS
HPGD
MGMT
0.23
17
5
37
11
77.3%
77.1%
0.0007
0.0019
0.0089
22
48


ICAM1
ITGA6
MYC
0.23
19
5
38
11
79.2%
77.6%
0.0429
0.0008
0.0013
24
49


BAX
CCND1
ITGA6
0.23
18
5
38
11
78.3%
77.6%
0.0045
0.0336
0.0016
23
49


FOS
MUC1
NCOA1
0.23
18
6
36
12
75.0%
75.0%
0.0002
0.0011
0.0359
24
48


FOS
IL8
RPL13A
0.23
18
6
37
12
75.0%
75.5%
0.0007
0.0213
0.0020
24
49


HPGD
PCNA
SLPI
0.23
18
4
36
12
81.8%
75.0%
0.0002
0.0052
0.0019
22
48


FOS
ITGA6
MYC
0.23
18
6
37
12
75.0%
75.5%
0.0484
0.0054
0.0010
24
49


ING1
PTGS2
SLPI
0.23
18
6
37
12
75.0%
75.5%
0.0021
0.0081
0.0038
24
49


FOS
PTGS2
SLPI
0.23
18
6
38
11
75.0%
77.6%
0.0021
0.0049
0.0064
24
49


BAX
ITGA6
TSC22D3
0.23
18
5
38
11
78.3%
77.6%
8.0E−05
0.0077
0.0375
23
49


PTGS2
SLPI
USP9X
0.23
18
5
38
11
78.3%
77.6%
0.0095
0.0005
0.0013
23
49


BAX
ICAM1
ITGA6
0.23
18
5
38
11
78.3%
77.6%
0.0015
0.0387
0.0032
23
49


HPGD
PI3
SLPI
0.23
17
4
37
11
81.0%
77.1%
0.0028
0.0163
0.0003
21
48


ING1
MUC1
TP53
0.23
17
5
37
11
77.3%
77.1%
0.0006
0.0473
0.0062
22
48


TP53
VIM

0.23
16
5
37
12
76.2%
75.5%
0.0289
1.4E−05

21
49


BAX
MYCBP
PLAU
0.23
18
6
36
12
75.0%
75.0%
4.1E−05
0.0043
0.0074
24
48


BCL2
ITGA6
RP51077B9.4
0.23
18
6
37
11
75.0%
77.1%
0.0028
0.0032
0.0007
24
48


CCND1
ILF2
ITGA6
0.23
18
6
37
12
75.0%
75.5%
0.0184
0.0042
0.0003
24
49


ILF2
ITGA6
RB1
0.23
18
6
37
12
75.0%
75.5%
5.6E−06
0.0007
0.0183
24
49


DLC1
EIF4E
HPGD
0.23
16
5
36
12
76.2%
75.0%
0.0082
0.0049
0.0003
21
48


BAX
HPGD
MMP9
0.23
16
5
36
11
76.2%
76.6%
0.0008
0.0009
0.0421
21
47


FOS
GNB2L1
HPGD
0.23
17
5
38
10
77.3%
79.2%
0.0006
0.0124
0.0011
22
48


FOS
ING1
PTGS2
0.23
18
6
37
12
75.0%
75.5%
0.0048
0.0082
0.0111
24
49


ING1
MYC
TP53
0.23
18
5
37
12
78.3%
75.5%
0.0021
0.0284
0.0021
23
49


ICAM1
ITGA6
RPL13A
0.23
19
5
38
11
79.2%
77.6%
0.0042
0.0014
0.0020
24
49


FOS
HPGD
NME1
0.23
18
4
38
10
81.8%
79.2%
0.0005
0.0008
0.0135
22
48


BAX
CDKN1B
SLPI
0.22
18
6
37
12
75.0%
75.5%
0.0004
0.0410
0.0034
24
49


BAX
RP51077B9.4
TGFBR1
0.22
18
5
37
12
78.3%
75.5%
0.0011
0.0117
0.0359
23
49


FOS
HPGD
MCM7
0.22
17
5
37
11
77.3%
77.1%
0.0035
0.0019
0.0149
22
48


ERBB2
TIMP1
TP53
0.22
16
5
37
12
76.2%
75.5%
0.0097
0.0004
0.0124
21
49


FOS
HPGD
MYBL2
0.22
17
5
36
12
77.3%
75.0%
0.0009
0.0022
0.0158
22
48


CRABP2
HPGD
SLPI
0.22
17
5
36
11
77.3%
76.6%
0.0069
0.0007
0.0035
22
47


FOS
HPGD
PSMB5
0.22
18
4
38
10
81.8%
79.2%
0.0006
0.0015
0.0165
22
48


ILF2
ITGA6
SLPI
0.22
19
5
39
10
79.2%
79.6%
0.0002
0.0006
0.0268
24
49


BAX
C3
RP51077B9.4
0.22
18
5
37
11
78.3%
77.1%
0.0022
0.0447
0.0026
23
48


ICAM1
MYCBP
RPL13A
0.22
18
6
37
12
75.0%
75.5%
0.0004
0.0019
0.0043
24
49


BAX
PSMB5
TSC22D3
0.22
18
5
38
11
78.3%
77.6%
0.0001
0.0139
0.0265
23
49


BAX
GNB2L1
ING1
0.22
18
6
37
12
75.0%
75.5%
0.0021
0.0071
0.0454
24
49


CCND1
HPGD
PLAU
0.22
19
3
36
11
86.4%
76.6%
0.0026
0.0014
0.0191
22
47


ITGA6
SLPI
USP9X
0.22
18
5
37
12
78.3%
75.5%
0.0178
0.0084
0.0016
23
49


MYCBP
TIMP1
USP9X
0.22
18
5
37
12
78.3%
75.5%
0.0149
0.0027
0.0288
23
49


FOS
MYBL2
PTGS2
0.22
18
5
37
12
78.3%
75.5%
7.6E−05
0.0130
0.0468
23
49


MYC
TIMP1
TP53
0.22
17
5
37
12
77.3%
75.5%
0.0108
0.0034
0.0112
22
49


CCND1
ITGA6
RP51077B9.4
0.22
18
6
37
12
75.0%
75.5%
0.0037
0.0162
0.0075
24
49


FOS
ILF2
RB1
0.22
19
5
39
10
79.2%
79.6%
0.0013
0.0017
0.0052
24
49


CDK4
MYCBP
RP51077B9.4
0.22
18
6
37
12
75.0%
75.5%
0.0036
0.0120
0.0007
24
49


MGMT
PITRM1
RP51077B9.4
0.22
18
6
37
12
75.0%
75.5%
0.0025
0.0199
7.7E−05
24
49


CDK4
RP51077B9.4
UBE3A
0.21
19
5
37
12
79.2%
75.5%
0.0032
0.0072
0.0122
24
49


BAX
PSMB5
SLPI
0.21
18
5
38
11
78.3%
77.6%
0.0004
0.0458
0.0345
23
49


NFKB1
RPL13A
TP53
0.21
17
5
37
12
77.3%
75.5%
0.0003
0.0327
0.0052
22
49


BCL2
ITGA6
TIMP1
0.21
19
5
37
11
79.2%
77.1%
0.0055
0.0014
0.0017
24
48


CCND1
ITGA6
MUC1
0.21
18
6
37
11
75.0%
77.1%
0.0047
0.0029
0.0085
24
48


BCL2
FOS
PTGS2
0.21
18
6
36
12
75.0%
75.0%
0.0254
3.2E−05
0.0141
24
48


FOS
NCOA1
USP9X
0.21
18
5
37
12
78.3%
75.5%
0.0009
0.0426
0.0044
23
49


EIF4E
HPGD
SLPI
0.21
17
5
36
12
77.3%
75.0%
0.0153
0.0003
0.0084
22
48


GNB2L1
HPGD
SLPI
0.21
17
5
37
11
77.3%
77.1%
0.0153
0.0004
0.0013
22
48


FOS
HPGD
MDM2
0.21
17
5
37
11
77.3%
77.1%
0.0052
0.0014
0.0307
22
48


HPGD
NME1
SLPI
0.21
17
5
37
11
77.3%
77.1%
0.0003
0.0170
0.0012
22
48


CDK4
DLC1
MYCBP
0.21
18
6
37
12
75.0%
75.5%
0.0003
0.0009
0.0063
24
49


HPGD
PSMB5
SLPI
0.21
19
3
37
11
86.4%
77.1%
0.0005
0.0177
0.0013
22
48


CRABP2
FOS
MYCBP
0.21
19
6
36
12
76.0%
75.0%
0.0026
0.0005
0.0082
25
48


CDK4
ITGA6
RP51077B9.4
0.20
19
5
38
11
79.2%
77.6%
0.0064
0.0206
0.0038
24
49


FOS
HPGD
RBL2
0.20
17
5
37
10
77.3%
78.7%
0.0015
0.0013
0.0267
22
47


FOS
MCM7
MYCBP
0.20
20
5
37
12
80.0%
75.5%
8.4E−05
0.0043
0.0078
25
49


ICAM1
ITGA6
RPS3
0.20
19
5
39
10
79.2%
79.6%
0.0003
0.0033
0.0061
24
49


CRABP2
FOS
HPGD
0.20
18
4
36
11
81.8%
76.6%
0.0286
0.0084
0.0043
22
47


ERBB2
FOS
MYCBP
0.20
20
4
37
12
83.3%
75.5%
0.0038
0.0009
0.0204
24
49


NFKB1
PTGS2
SLPI
0.20
18
6
37
12
75.0%
75.5%
0.0095
0.0216
0.0170
24
49


ITGA6
MUC1
USP9X
0.20
19
4
37
11
82.6%
77.1%
0.0070
0.0184
0.0269
23
48


CASP9
HPGD
SLPI
0.20
17
5
36
12
77.3%
75.0%
0.0242
0.0010
0.0253
22
48


PI3
SLPI
TIMP1
0.20
19
4
38
11
82.6%
77.6%
0.0045
0.0030
0.0155
23
49


BRCA2
FOS
HPGD
0.20
17
5
36
12
77.3%
75.0%
0.0476
0.0005
0.0021
22
48


FOS
PSMB5
PTGS2
0.20
18
5
37
12
78.3%
75.5%
4.4E−05
0.0314
0.0240
23
49


BAX
FOS

0.20
18
6
37
12
75.0%
75.5%
0.0028
0.0059

24
49


FOS
MYBL2
MYCBP
0.20
18
6
37
12
75.0%
75.5%
7.3E−05
0.0085
0.0242
24
49


CDKN1B
FOS
PTGS2
0.20
19
5
39
10
79.2%
79.6%
0.0421
9.8E−05
0.0166
24
49


ITGA6
RP51077B9.4
RPL13A
0.20
18
6
37
12
75.0%
75.5%
0.0165
0.0196
0.0103
24
49


MCM7
PITRM1
RP51077B9.4
0.20
18
6
37
12
75.0%
75.5%
0.0067
0.0134
0.0001
24
49


ERBB2
ITGA6
RPL13A
0.19
18
5
37
12
78.3%
75.5%
0.0138
0.0013
0.0245
23
49


ING1
PITRM1
RP51077B9.4
0.19
20
4
38
11
83.3%
77.6%
0.0071
0.0425
0.0056
24
49


ICAM1
MYC
MYCBP
0.19
20
5
37
12
80.0%
75.5%
0.0052
0.0272
0.0080
25
49


FOS
MYC
MYCBP
0.19
19
6
37
12
76.0%
75.5%
0.0054
0.0074
0.0376
25
49


BCL2
ITGA6
SLPI
0.19
19
5
36
12
79.2%
75.0%
0.0011
0.0023
0.0044
24
48


CCND1
CDK4
ITGA6
0.19
18
6
37
12
75.0%
75.5%
0.0071
0.0252
0.0032
24
49


RP51077B9.4
RPS3
UBE3A
0.19
18
6
37
12
75.0%
75.5%
0.0029
0.0104
0.0182
24
49


CTNNB1
HPGD
SLPI
0.19
18
4
39
9
81.8%
81.3%
0.0437
0.0008
0.0104
22
48


HPGD
MYBL2
SLPI
0.19
17
5
37
11
77.3%
77.1%
0.0015
0.0442
0.0044
22
48


CRABP2
MYCBP
RP51077B9.4
0.19
18
6
36
12
75.0%
75.0%
0.0118
0.0312
0.0013
24
48


ATM
CRABP2
ERBB2
0.19
18
6
37
11
75.0%
77.1%
0.0029
0.0155
0.0088
24
48


BCL2
CASP9
ITGA6
0.19
19
5
36
12
79.2%
75.0%
0.0250
0.0060
0.0009
24
48


DLC1
HPGD
NME1
0.19
16
5
37
11
76.2%
77.1%
0.0039
0.0016
0.0352
21
48


BCL2
CCND1
ITGA6
0.18
19
5
37
11
79.2%
77.1%
0.0304
0.0070
0.0019
24
48


ITGA6
RPL13A
THBS1
0.18
18
6
37
12
75.0%
75.5%
0.0040
0.0006
0.0387
24
49


CCND1
MYCBP
TIMP1
0.18
19
5
37
12
79.2%
75.5%
0.0200
0.0331
0.0038
24
49


CRABP2
MYCBP
TIMP1
0.18
18
6
36
12
75.0%
75.0%
0.0183
0.0143
0.0018
24
48


CDK4
GADD45A
HPGD
0.18
17
5
37
11
77.3%
77.1%
0.0132
0.0360
0.0025
22
48


EIF4E
FOS
ITGA6
0.18
18
6
37
12
75.0%
75.5%
0.0147
0.0013
0.0279
24
49


MYC
PTGS2
SLPI
0.18
18
6
37
12
75.0%
75.5%
0.0316
0.0375
0.0048
24
49


CRABP2
MYC
TP53
0.18
18
5
36
12
78.3%
75.0%
0.0261
0.0011
0.0246
23
48


CCND1
CRABP2
IL8
0.18
20
5
38
10
80.0%
79.2%
0.0432
0.0295
0.0058
25
48


CCND1
MUC1
TP53
0.17
17
5
36
12
77.3%
75.0%
0.0092
0.0136
0.0476
22
48


ICAM1
ITGA6
MUC1
0.17
18
6
36
12
75.0%
75.0%
0.0380
0.0175
0.0281
24
48


CASP9
FOS
MYCBP
0.17
18
6
37
12
75.0%
75.5%
0.0313
0.0334
0.0265
24
49


GADD45A
HPGD
RPS3
0.17
17
5
37
11
77.3%
77.1%
0.0255
0.0018
0.0238
22
48


CRABP2
ITGA6
TIMP1
0.17
18
6
36
12
75.0%
75.0%
0.0495
0.0282
0.0042
24
48


ITGA6
MUC1
SLPI
0.17
18
6
37
11
75.0%
77.1%
0.0352
0.0043
0.0455
24
48


ATM
FOS
GNB2L1
0.17
19
6
37
12
76.0%
75.5%
0.0356
0.0040
0.0141
25
49


DLC1
RPS3
UBE3A
0.17
18
5
37
12
78.3%
75.5%
0.0175
0.0015
0.0210
23
49


EIF4E
RB1
TIMP1
0.17
18
6
37
12
75.0%
75.5%
0.0377
0.0117
0.0011
24
49


MYC
TP53
VEZF1
0.17
17
5
36
11
77.3%
76.6%
0.0189
0.0093
0.0206
22
47


CCND1
ING1
RBL2
0.16
20
4
36
12
83.3%
75.0%
0.0467
0.0083
0.0417
24
48


BRCA1
MCM7
SLPI
0.16
19
6
37
12
76.0%
75.5%
0.0253
0.0031
0.0005
25
49


IFITM3
MUC1
MYCBP
0.16
18
6
36
12
75.0%
75.0%
0.0079
0.0031
0.0354
24
48


ATM
CRABP2
EIF4E
0.16
18
6
36
12
75.0%
75.0%
0.0026
0.0482
0.0168
24
48


BCL2
HPGD
ITGA6
0.16
17
5
37
11
77.3%
77.1%
0.0030
0.0259
0.0273
22
48


CRABP2
ERBB2
TGFBR1
0.15
19
4
36
12
82.6%
75.0%
0.0277
0.0163
0.0204
23
48


BCL2
GADD45A
ITGA6
0.15
18
6
36
12
75.0%
75.0%
0.0032
0.0373
0.0054
24
48


ATM
CDKN1B
RPS3
0.15
18
6
37
12
75.0%
75.5%
0.0013
0.0254
0.0242
24
49


BCL2
ITGA6
USP10
0.15
18
6
36
12
75.0%
75.0%
0.0017
0.0016
0.0443
24
48


ATM
EIF4E
RPS3
0.15
18
6
37
12
75.0%
75.5%
0.0012
0.0300
0.0488
24
49


CASP9
ERBB2
TP53
0.14
16
5
38
11
76.2%
77.6%
0.0159
0.0434
0.0283
21
49


ABCB1
IFITM3
MYCBP
0.12
19
6
37
12
76.0%
75.5%
0.0314
0.0012
0.0341
25
49


HPGD
VEGF

0.11
17
5
36
12
77.3%
75.0%
0.0071
0.0227

22
48





















TABLE 1B










Breast
Normals
Sum







Group Size
65.3%
34.7%
100%



N =
49
26
75

















Gene
Mean
Mean
Z-statistic
p-val







EGR1
18.2
19.3
−6.53
6.4E−11



CTSD
11.8
12.5
−4.42
9.9E−06



TGFB1
11.6
12.2
−4.26
2.0E−05



TNF
17.2
17.9
−4.21
2.6E−05



MTA1
18.3
18.8
−3.84
0.0001



VIM
10.3
10.9
−3.72
0.0002



BAX
14.4
14.8
−3.25
0.0011



RP51077B9.4
15.2
15.6
−3.22
0.0013



NFKB1
15.5
15.9
−3.08
0.0021



ICAM1
15.9
16.4
−3.01
0.0026



TIMP1
13.2
13.7
−2.99
0.0027



ING1
16.1
16.4
−2.99
0.0028



FOS
13.8
14.4
−2.99
0.0028



MYC
17.0
17.4
−2.88
0.0040



USP9X
14.7
15.1
−2.83
0.0047



SLPI
16.2
16.9
−2.73
0.0064



MUC1
21.5
21.9
−2.65
0.0080



VEZF1
15.4
15.7
−2.55
0.0107



CASP9
17.0
17.4
−2.51
0.0122



ERBB2
20.9
21.4
−2.47
0.0136



RPL13A
10.5
10.8
−2.41
0.0159



CDK4
16.0
16.3
−2.41
0.0162



DLC1
22.2
22.6
−2.38
0.0175



IFITM3
8.0
8.4
−2.35
0.0189



CCND1
21.1
21.6
−2.34
0.0191



CRABP2
20.4
20.8
−2.31
0.0207



CDKN1A
15.0
15.4
−2.26
0.0238



HPGD
20.4
19.8
2.17
0.0299



GADD45A
18.2
18.5
−2.06
0.0394



ILF2
16.0
16.3
−2.05
0.0402



TSC22D3
17.4
17.8
−2.04
0.0411



PLAU
22.7
23.1
−2.04
0.0414



THBS1
16.8
17.4
−2.01
0.0449



GATA3
16.2
16.5
−1.99
0.0462



ATBF1
19.1
19.4
−1.89
0.0592



MMP9
13.4
13.9
−1.84
0.0659



MGMT
18.5
18.8
−1.82
0.0694



RPS3
11.8
12.1
−1.76
0.0783



CDKN1B
14.1
14.3
−1.71
0.0867



NCOA1
15.0
15.3
−1.70
0.0895



MCM7
16.9
17.1
−1.66
0.0966



JUN
20.0
20.3
−1.65
0.0982



PITRM1
16.6
16.8
−1.59
0.1119



BCL2
14.7
14.9
−1.58
0.1136



VEGF
21.8
22.1
−1.57
0.1159



IL8
21.6
21.2
1.55
0.1215



MYBL2
19.4
19.8
−1.50
0.1336



EIF4E
15.8
16.1
−1.47
0.1409



PCNA
17.0
17.2
−1.46
0.1445



CXCL2
23.7
24.1
−1.43
0.1532



CTSB
12.7
12.8
−1.43
0.1534



USP10
14.4
14.7
−1.33
0.1849



LAMB2
22.7
23.0
−1.26
0.2071



ITGB3
16.4
16.8
−1.24
0.2142



MK167
21.7
22.0
−1.18
0.2392



GNB2L1
11.3
11.5
−1.16
0.2469



CTNNB1
13.9
14.1
−1.00
0.3167



ATM
15.9
15.7
0.98
0.3295



PSMB5
18.8
18.9
−0.95
0.3446



UBE3A
16.8
16.6
0.94
0.3458



TP53
15.3
15.4
−0.91
0.3647



ESR1
20.7
20.9
−0.88
0.3794



ABCB1
18.2
18.4
−0.77
0.4420



TOP2A
21.6
21.5
0.74
0.4596



MDM2
15.3
15.4
−0.70
0.4809



BRCA2
22.6
22.4
0.67
0.5031



PTGS2
16.2
16.3
−0.63
0.5311



NME1
18.7
18.8
−0.59
0.5520



FLT1
21.2
21.1
0.57
0.5663



C3
21.2
21.3
−0.56
0.5771



ITGA6
18.3
18.2
0.47
0.6378



CASP8
14.2
14.2
−0.46
0.6483



BRCA1
20.8
20.9
−0.45
0.6497



CCNE1
21.7
21.8
−0.42
0.6771



TGFBR1
17.6
17.5
0.38
0.7023



PSMD1
15.9
16.0
−0.35
0.7261



IGF2
21.0
20.9
0.31
0.7562



MYCBP
17.2
17.2
−0.22
0.8252



PI3
14.2
14.2
−0.20
0.8417



RBL2
15.7
15.7
−0.17
0.8636



CDH1
19.6
19.6
−0.12
0.9061



RB1
16.8
16.8
−0.09
0.9283



ESR2
23.1
23.1
0.04
0.9712























TABLE 1C











Predicted








probability


Patient ID
Group
CTSD
EGR1
NCOA1
CTSDEGR1
of breast cancer





















Breast Cancer
BC-014-BC:200066434
10.62
14.57
14.82
12.09
1


Breast Cancer
BC-019-BC:200066443
9.97
15.74
14.32
12.12
1


Breast Cancer
BC-006-BC:200066421
9.82
16.09
13.95
12.16
1


Breast Cancer
BC-017-BC:200066441
12.09
14.68
15.51
13.05
1


Breast Cancer
BC-041-BC:200066454
9.68
16.62
12.23
12.27
1


Breast Cancer
BC-002-BC:200066417
13.26
16.42
15.94
14.44
1


Breast Cancer
BC-018-BC:200066442
11.55
18.21
15.08
14.03
1


Breast Cancer
BC-059-BC:200066472
11.19
17.82
14.32
13.66
1


Breast Cancer
BC-056-BC:200066469
11.06
17.94
14.18
13.62
1


Breast Cancer
BC-058-BC:200066471
11.61
18.40
15.21
14.14
1


Breast Cancer
BC-032-BC:200066445
12.25
18.78
16.28
14.69
1


Breast Cancer
BC-048-BC:200066461
11.73
18.10
15.10
14.10
1


Breast Cancer
BC-012-BC:200066429
11.74401
18.25369
15.18565
14.17
1


Breast Cancer
BC-001-BC:200066416
11.86214
17.59438
14.80292
14.00
1


Breast Cancer
BC-005-BC:200066420
11.77834
17.92558
14.9405
14.07
1


Breast Cancer
BC-035-BC:200066448
11.42697
18.40522
14.8325
14.03
1


Breast Cancer
BC-015-BC:200066437
11.53728
18.06908
14.64923
13.97
1


Breast Cancer
BC-008-BC:200066423
12.45004
18.60401
16.16374
14.74
1


Breast Cancer
BC-044-BC:200066457
11.5368
18.03583
14.53045
13.96
1


Breast Cancer
BC-036-BC:200066449
11.64733
17.88607
14.52136
13.97
1


Breast Cancer
BC-013-BC:200066431
11.66487
18.49188
14.95413
14.21
1


Breast Cancer
BC-053-BC:200066466
11.15588
19.09311
14.74202
14.11
1


Breast Cancer
BC-046-BC:200066459
12.32445
18.78295
15.93886
14.73
0.99


Breast Cancer
BC-037-BC:200066450
11.93685
17.68212
14.63976
14.08
0.99


Breast Cancer
BC-007-BC:200066422
12.208
18.42972
15.51945
14.53
0.99


Breast Cancer
BC-057-BC:200066470
12.01215
18.2403
15.08025
14.33
0.99


Breast Cancer
BC-050-BC:200066463
11.59187
18.57523
14.79528
14.19
0.99


Breast Cancer
BC-009-BC:200066424
12.03023
18.36
15.13668
14.39
0.99


Breast Cancer
BC-010-BC:200066425
11.70047
18.37003
14.68811
14.19
0.98


Breast Cancer
BC-004-BC:200066419
12.57427
18.46415
15.82888
14.77
0.98


Breast Cancer
BC-054-BC:200066467
11.40154
19.71256
15.2752
14.50
0.97


Breast Cancer
BC-033-BC:200066446
12.20243
18.43507
15.30903
14.53
0.97


Breast Cancer
BC-049-BC:200066462
12.32989
18.6888
15.62522
14.70
0.96


Breast Cancer
BC-051-BC:200066464
11.95477
19.18524
15.49229
14.65
0.95


Breast Cancer
BC-034-BC:200066447
11.58159
19.3302
15.1268
14.47
0.95


Breast Cancer
BC-052-BC:200066465
12.21681
18.68968
15.42704
14.63
0.94


Breast Cancer
BC-047-BC:200066460
13.02757
17.96669
15.86956
14.87
0.93


Breast Cancer
BC-055-BC:200066468
11.96329
18.63264
15.02912
14.45
0.92


Normals
HN-041-BC:200066225
11.63431
18.86254
14.75467
14.33
0.9


Breast Cancer
BC-045-BC:200066458
12.02184
18.49644
14.92445
14.44
0.87


Normals
HN-001-BC:200066181
12.68524
18.64445
15.85585
14.91
0.87


Breast Cancer
BC-038-BC:200066451
11.89475
19.24451
15.29358
14.63
0.85


Breast Cancer
BC-003-BC:200066418
11.88236
18.99011
15.07942
14.53
0.84


Breast Cancer
BC-040-BC:200066453
11.21427
18.54812
13.90453
13.95
0.81


Breast Cancer
BC-016-BC:200066439
11.92028
19.10675
15.17475
14.60
0.79


Breast Cancer
BC-039-BC:200066452
12.53243
19.02137
15.85787
14.95
0.77


Normals
HN-006-BC:200066194
11.80149
18.46923
14.54324
14.29
0.77


Breast Cancer
BC-060-BC:200066305
11.96884
18.57426
14.80583
14.43
0.74


Breast Cancer
BC-011-BC:200066427
12.30331
18.58835
15.14965
14.65
0.6


Breast Cancer
BC-043-BC:200066456
12.6817
18.8907
15.80326
15.00
0.53


Breast Cancer
BC-042-BC:200066455
12.20118
19.07027
15.29172
14.76
0.44


Normals
HN-120-BC:200066264
12.65969
19.40558
16.09383
15.17
0.41


Normals
HN-042-BC:200066229
11.76364
19.04343
14.66433
14.48
0.32


Normals
HN-004-BC:200066190
11.86531
18.98835
14.73974
14.52
0.3


Breast Cancer
BC-031-BC:200066444
11.82072
19.0833
14.71032
14.53
0.24


Normals
HN-110-BC:200066252
12.23857
19.28161
15.23176
14.86
0.09


Normals
HN-125-BC:200066268
12.06238
19.2848
14.9873
14.75
0.08


Normals
HN-103-BC:200066241
12.24544
19.80992
15.56997
15.06
0.06


Normals
HN-111-BC:200066256
12.52891
19.46284
15.64758
15.11
0.05


Normals
HN-118-BC:200066260
12.37726
19.5409
15.51071
15.05
0.05


Normals
HN-050-BC:200066233
12.33328
19.10894
15.03955
14.86
0.02


Normals
HN-133-BC:200066272
12.0091
19.91638
15.20508
14.96
0.02


Normals
HN-146-BC:200066280
12.1568
19.41271
14.95357
14.86
0.01


Normals
HN-028-BC:200066206
12.49216
19.82457
15.66233
15.23
0.01


Normals
HN-033-BC:200066218
13.24309
19.29191
16.19468
15.50
0.01


Normals
HN-034-BC:200066222
11.98426
19.21925
14.50831
14.68
0.01


Normals
HN-011-BC:200066198
12.23494
19.31849
14.82094
14.88
0


Normals
HN-032-BC:200066214
12.51633
19.44526
15.24896
15.10
0


Normals
HN-150-BC:200066288
12.74933
18.86158
15.07806
15.03
0


Normals
HN-002-BC:200066186
13.31848
19.13469
15.74977
15.49
0


Normals
HN-104-BC:200066292
13.05896
19.50535
15.59307
15.46
0


Normals
HN-031-BC:200066210
12.86126
20.01324
15.4791
15.53
0


Normals
HN-109-BC:200066248
12.96254
19.5963
15.24024
15.44
0


Normals
HN-022-BC:200066202
13.48175
19.71499
15.86826
15.81
0


























TABLE 2A

















total used








Normal
Breast


(excludes







N =
26
49


missing)


















2-gene models and
Entropy
#normal
#normal
#bi
#bi
Correct
Correct


# nor-



1-gene models
R-sq
Correct
FALSE
Correct
FALSE
Classification
Classification
p-val 1
p-val 2
mals
# disease






















CCR5
EGR1
0.45
21
5
40
9
80.8%
81.6%
0.0059
1.1E−08
26
49


EGR1
IL18BP
0.45
21
5
41
8
80.8%
83.7%
1.3E−07
0.0061
26
49


EGR1
TLR2
0.44
22
4
41
7
84.6%
85.4%
6.8E−08
0.0062
26
48


EGR1
MHC2TA
0.43
19
5
38
10
79.2%
79.2%
3.2E−08
0.0159
24
48


EGR1
TNF
0.43
22
4
40
7
84.6%
85.1%
1.0E−05
0.0317
26
47


EGR1
IFNG
0.42
22
4
39
8
84.6%
83.0%
3.6E−10
0.0114
26
47


CD86
EGR1
0.42
21
5
40
9
80.8%
81.6%
0.0228
9.0E−09
26
49


EGR1
TOSO
0.42
20
6
38
11
76.9%
77.6%
3.3E−09
0.0246
26
49


EGR1
TNFSF5
0.41
22
4
40
9
84.6%
81.6%
7.9E−10
0.0344
26
49


CD19
EGR1
0.41
21
5
40
9
80.8%
81.6%
0.0347
1.3E−09
26
49


EGR1
HLADRA
0.41
22
4
40
9
84.6%
81.6%
2.2E−08
0.0417
26
49


EGR1
IL32
0.41
20
6
39
10
76.9%
79.6%
3.4E−09
0.0478
26
49


EGR1
IL18
0.41
22
4
41
8
84.6%
83.7%
5.1E−10
0.0481
26
49


ADAM17
EGR1
0.40
22
4
39
8
84.6%
83.0%
0.0417
7.4E−10
26
47


EGR1

0.37
21
5
40
9
80.8%
81.6%
2.4E−09

26
49


CCR3
TNF
0.30
20
6
36
11
76.9%
76.6%
0.0064
1.7E−07
26
47


HSPA1A
TNF
0.30
20
6
36
11
76.9%
76.6%
0.0066
1.2E−07
26
47


IRF1
TNF
0.30
21
5
37
10
80.8%
78.7%
0.0088
4.1E−07
26
47


CXCL1
TNF
0.28
20
6
36
11
76.9%
76.6%
0.0211
2.5E−07
26
47


TLR2
TLR4
0.27
21
5
38
10
80.8%
79.2%
3.1E−07
0.0003
26
48


HSPA1A
TGFB1
0.26
21
5
39
10
80.8%
79.6%
0.0064
1.0E−06
26
49


IL18BP
LTA
0.25
16
5
32
10
76.2%
76.2%
6.7E−06
0.0061
21
42


CXCL1
TLR2
0.25
21
5
37
11
80.8%
77.1%
0.0013
1.2E−06
26
48


IL1R1
TLR2
0.24
20
6
37
11
76.9%
77.1%
0.0016
1.6E−06
26
48


IL10
IL18BP
0.22
21
5
38
11
80.8%
77.6%
0.0131
0.0014
26
49


CCR5
MIF
0.21
20
6
38
11
76.9%
77.6%
7.2E−06
0.0020
26
49


DPP4
IL18BP
0.21
21
5
38
11
80.8%
77.6%
0.0292
9.0E−06
26
49


IL18BP
TLR2
0.20
20
6
37
11
76.9%
77.1%
0.0168
0.0351
26
48


C1QA
IL18BP
0.19
20
6
37
11
76.9%
77.1%
0.0454
0.0062
26
48


IL8
TLR2
0.19
20
6
37
11
76.9%
77.1%
0.0272
4.1E−05
26
48


C1QA
IL10
0.18
20
6
37
11
76.9%
77.1%
0.0117
0.0130
26
48


CCR3
CCR5
0.18
20
6
37
12
76.9%
75.5%
0.0090
6.0E−05
26
49


IL18BP

0.16
20
6
38
11
76.9%
77.6%
9.4E−05

26
49





















TABLE 2B










Breast
Normal
Sum







Group Size
65.3%
34.7%
100%



N =
49
26
75







Gene
Mean
Mean
p-val







EGR1
18.2
19.3
2.4E−09



TNF
17.3
18.1
4.0E−06



TGFB1
11.8
12.3
3.1E−05



IFI16
13.1
13.7
4.5E−05



IL18BP
16.3
16.8
9.4E−05



HMOX1
14.8
15.5
0.0002



TLR2
14.8
15.3
0.0003



SERPINA1
12.2
12.8
0.0007



C1QA
19.4
20.4
0.0008



IL10
22.0
22.8
0.0008



CCR5
16.4
17.0
0.0011



ICAM1
16.6
17.0
0.0023



MHC2TA
14.8
15.3
0.0028



TIMP1
13.3
13.7
0.0030



HLADRA
11.2
11.6
0.0036



CCL3
19.7
20.2
0.0040



PLAUR
13.8
14.3
0.0043



CD86
16.6
17.0
0.0052



MNDA
11.8
12.2
0.0058



MYC
17.1
17.5
0.0064



NFKB1
16.4
16.8
0.0081



CCL5
11.2
11.6
0.0107



PTPRC
10.8
11.1
0.0118



IL1B
14.9
15.4
0.0167



CD4
14.8
15.1
0.0170



TOSO
15.2
15.6
0.0172



CASP1
15.5
15.9
0.0194



CXCR3
16.4
16.7
0.0203



TNFRSF1A
13.9
14.2
0.0246



SERPINE1
20.0
20.6
0.0282



IL32
13.1
13.4
0.0319



IL1RN
15.3
15.8
0.0355



SSI3
16.5
17.0
0.0367



GZMB
16.5
17.0
0.0579



CD19
17.7
18.1
0.0728



ALOX5
16.6
16.9
0.0809



IRF1
12.6
12.7
0.1103



TNFSF6
19.2
19.5
0.1213



TNFSF5
17.1
17.3
0.1277



VEGF
21.9
22.2
0.1331



MAPK14
13.7
13.9
0.1532



MMP9
13.6
14.0
0.1704



IL5
20.8
21.1
0.1804



PTGS2
16.3
16.5
0.1942



IL8
21.5
21.1
0.2146



IL23A
20.3
20.6
0.2205



CCR3
16.6
16.4
0.2460



CD8A
15.2
15.4
0.2489



PLA2G7
18.6
18.8
0.2842



TXNRD1
16.3
16.4
0.2937



IFNG
21.9
22.2
0.3062



CASP3
20.9
20.7
0.3105



HSPA1A
14.2
14.4
0.3332



IL18
21.1
21.2
0.3363



IL15
20.6
20.4
0.3372



ADAM17
17.1
17.2
0.5379



ELA2
20.5
20.7
0.5516



DPP4
18.3
18.4
0.5979



IL1R1
19.8
19.7
0.6131



MMP12
23.3
23.1
0.6211



TLR4
14.2
14.3
0.6946



LTA
17.7
17.8
0.7021



CTLA4
18.7
18.7
0.7436



TNFRSF13B
19.1
19.1
0.8280



MIF
14.9
14.8
0.8384



APAF1
17.6
17.6
0.8535



HMGB1
17.0
17.0
0.8769



CXCL1
19.3
19.3
0.9724























TABLE 2C











Predicted








probability


Patient





of


ID
Group
CCR5
EGR1
logit
odds
Breast Inf





















14
Breast
15.60
14.51
16.47
14201817.67
1.0000


19
Breast
15.32
15.15
14.88
2886749.32
1.0000


41
Breast
13.05
16.49
14.22
1492705.97
1.0000


17
Breast
17.64
14.97
11.73
123756.31
1.0000


2
Breast
17.94
15.67
8.98
7957.40
0.9999


6
Breast
16.94
16.23
8.79
6587.63
0.9998


47
Breast
15.79
17.40
6.89
984.55
0.9990


36
Breast
15.55
17.83
5.88
357.11
0.9972


5
Breast
15.30
18.07
5.50
244.10
0.9959


59
Breast
15.54
17.95
5.49
243.29
0.9959


18
Breast
16.13
17.68
5.43
229.16
0.9957


37
Breast
16.29
17.75
4.94
139.71
0.9929


10
Breast
16.40
18.03
3.87
48.06
0.9796


3
Breast
16.26
18.11
3.83
46.28
0.9788


31
Breast
15.54
18.55
3.58
35.72
0.9728


58
Breast
15.96
18.47
3.15
23.37
0.9590


56
Breast
16.69
18.16
2.98
19.64
0.9516


60
Breast
15.70
18.69
2.86
17.49
0.9459


35
Breast
16.22
18.46
2.77
15.98
0.9411


1
Breast
16.80
18.17
2.76
15.81
0.9405


53
Breast
15.80
18.69
2.70
14.90
0.9371


46
Breast
16.36
18.42
2.68
14.60
0.9359


15
Breast
16.58
18.33
2.62
13.69
0.9319


149
Normals
16.46
18.42
2.52
12.42
0.9255


57
Breast
16.75
18.29
2.47
11.79
0.9218


33
Breast
16.31
18.55
2.34
10.41
0.9124


7
Breast
16.85
18.28
2.33
10.29
0.9115


44
Breast
16.30
18.56
2.33
10.28
0.9113


12
Breast
16.60
18.41
2.31
10.10
0.9099


1
Normals
17.24
18.11
2.26
9.60
0.9056


4
Normals
17.03
18.24
2.18
8.83
0.8982


45
Breast
17.16
18.22
2.03
7.62
0.8840


4
Breast
17.41
18.18
1.78
5.95
0.8560


34
Breast
16.01
18.91
1.66
5.28
0.8408


54
Breast
16.06
18.92
1.55
4.73
0.8255


11
Breast
17.28
18.34
1.44
4.22
0.8083


50
Breast
16.29
18.88
1.31
3.71
0.7878


38
Breast
15.97
19.05
1.28
3.59
0.7823


43
Breast
16.00
19.07
1.16
3.18
0.7608


41
Normals
16.27
18.99
0.98
2.66
0.7265


42
Breast
16.07
19.11
0.91
2.50
0.7140


8
Breast
16.94
18.69
0.90
2.45
0.7100


109
Normals
15.90
19.25
0.72
2.06
0.6735


32
Breast
16.66
18.89
0.68
1.98
0.6640


48
Breast
17.30
18.57
0.67
1.96
0.6623


55
Breast
17.06
18.71
0.63
1.87
0.6517


16
Breast
16.56
18.97
0.58
1.79
0.6412


2
Normals
17.18
18.77
0.22
1.24
0.5544


110
Normals
16.52
19.11
0.20
1.22
0.5499


52
Breast
16.40
19.18
0.15
1.17
0.5383


13
Breast
16.38
19.30
−0.20
0.82
0.4506


40
Breast
16.84
19.08
−0.21
0.81
0.4479


146
Normals
15.84
19.62
−0.36
0.70
0.4120


39
Breast
17.04
19.03
−0.37
0.69
0.4087


49
Breast
17.00
19.06
−0.42
0.66
0.3959


104
Normals
17.21
18.97
−0.46
0.63
0.3879


51
Breast
15.79
19.71
−0.56
0.57
0.3633


111
Normals
16.82
19.21
−0.60
0.55
0.3544


34
Normals
16.74
19.26
−0.63
0.53
0.3477


6
Normals
16.27
19.51
−0.67
0.51
0.3387


42
Normals
16.83
19.30
−0.91
0.40
0.2876


28
Normals
17.04
19.22
−0.99
0.37
0.2708


9
Breast
18.11
18.77
−1.27
0.28
0.2194


50
Normals
16.97
19.37
−1.35
0.26
0.2054


125
Normals
16.16
19.90
−1.78
0.17
0.1446


32
Normals
17.24
19.41
−1.92
0.15
0.1283


150
Normals
17.65
19.30
−2.21
0.11
0.0986


133
Normals
16.64
19.84
−2.34
0.10
0.0880


33
Normals
17.68
19.33
−2.39
0.09
0.0841


11
Normals
17.55
19.47
−2.62
0.07
0.0681


103
Normals
17.03
19.86
−3.05
0.05
0.0452


120
Normals
17.21
19.78
−3.06
0.05
0.0446


22
Normals
18.58
19.43
−4.15
0.02
0.0155


118
Normals
17.57
19.96
−4.25
0.01
0.0141


31
Normals
17.12
20.61
−5.58
0.00
0.0038


























TABLE 3A

















total used








Normal
Breast


(excludes



En-



N =
22
49


missing)


















2-gene models and
tropy
#normal
#normal
#bi
#bi
Correct
Correct


# nor-



1-gene models
R-sq
Correct
FALSE
Correct
FALSE
Classification
Classification
p-val 1
p-val 2
mals
# disease






















EGR1
NME1
0.67
20
2
44
5
90.9%
89.8%
4.0E−14
0.0003
22
49


BAX
EGR1
0.66
19
3
43
6
86.4%
87.8%
0.0007
1.3E−11
22
49


EGR1
HRAS
0.64
19
3
44
5
86.4%
89.8%
2.5E−13
0.0016
22
49


BAD
EGR1
0.63
21
1
43
6
95.5%
87.8%
0.0025
5.2E−12
22
49


CASP8
EGR1
0.61
19
3
41
8
86.4%
83.7%
0.0063
3.1E−13
22
49


CDKN1A
EGR1
0.61
19
3
45
4
86.4%
91.8%
0.0075
5.7E−13
22
49


ABL1
EGR1
0.60
19
3
43
6
86.4%
87.8%
0.0102
5.4E−10
22
49


EGR1
WNT1
0.60
20
2
43
6
90.9%
87.8%
5.1E−11
0.0107
22
49


EGR1
GZMA
0.60
19
3
42
7
86.4%
85.7%
1.7E−12
0.0119
22
49


EGR1
TNFRSF10A
0.59
19
3
42
7
86.4%
85.7%
1.3E−12
0.0151
22
49


BCL2
EGR1
0.58
19
3
43
6
86.4%
87.8%
0.0216
2.5E−11
22
49


ABL2
EGR1
0.58
19
3
42
7
86.4%
85.7%
0.0228
7.8E−09
22
49


EGR1
PCNA
0.58
19
3
42
7
86.4%
85.7%
1.4E−12
0.0284
22
49


EGR1
S100A4
0.58
20
2
43
6
90.9%
87.8%
5.1E−12
0.0298
22
49


EGR1
NRAS
0.57
19
3
42
7
86.4%
85.7%
1.8E−10
0.0371
22
49


CDKN2A
EGR1
0.57
19
3
41
8
86.4%
83.7%
0.0390
3.3E−11
22
49


EGR1
TNFRSF10B
0.57
19
3
42
7
86.4%
85.7%
3.6E−11
0.0417
22
49


EGR1
ITGA3
0.57
18
3
42
7
85.7%
85.7%
1.1E−11
0.0255
21
49


EGR1

0.52
19
3
42
7
86.4%
85.7%
1.1E−11

22
49


NRAS
SMAD4
0.41
17
5
39
10
77.3%
79.6%
2.6E−09
3.6E−07
22
49


ABL2
SMAD4
0.38
18
4
39
10
81.8%
79.6%
9.9E−09
0.0001
22
49


CDK5
SMAD4
0.37
17
5
38
11
77.3%
77.6%
1.5E−08
3.0E−05
22
49


CDK5
SKIL
0.34
18
4
40
9
81.8%
81.6%
1.5E−07
8.5E−05
22
49


FOS
SOCS1
0.33
16
5
37
12
76.2%
75.5%
0.0089
5.3E−06
21
49


NOTCH2
TGFB1
0.31
17
5
38
11
77.3%
77.6%
0.0044
5.4E−07
22
49


SMAD4
TGFB1
0.31
19
3
39
10
86.4%
79.6%
0.0060
2.1E−07
22
49


SMAD4
SOCS1
0.30
18
4
38
11
81.8%
77.6%
0.0136
2.7E−07
22
49


ATM
CDK5
0.30
18
4
39
10
81.8%
79.6%
0.0008
9.2E−07
22
49


CDK5
ITGB1
0.29
17
5
37
12
77.3%
75.5%
6.6E−07
0.0011
22
49


CCNE1
SOCS1
0.28
17
5
38
11
77.3%
77.6%
0.0415
7.3E−07
22
49


SMAD4
TNF
0.28
17
5
38
11
77.3%
77.6%
0.0026
7.5E−07
22
49


ERBB2
SOCS1
0.28
17
5
38
11
77.3%
77.6%
0.0430
0.0016
22
49


ABL2
APAF1
0.28
18
4
37
12
81.8%
75.5%
7.1E−07
0.0101
22
49


PLAUR
TGFB1
0.28
16
5
37
12
76.2%
75.5%
0.0394
1.8E−05
21
49


ERBB2
SKIL
0.28
17
5
38
11
77.3%
77.6%
3.6E−06
0.0021
22
49


AKT1
TGFB1
0.27
17
5
38
10
77.3%
79.2%
0.0237
7.3E−06
22
48


TGFB1
TIMP1
0.27
17
5
38
11
77.3%
77.6%
3.6E−05
0.0449
22
49


ATM
BAX
0.26
17
5
38
11
77.3%
77.6%
0.0009
5.0E−06
22
49


ABL2
NOTCH2
0.26
17
5
37
12
77.3%
75.5%
7.6E−06
0.0329
22
49


BAX
SKIL
0.25
17
5
38
11
77.3%
77.6%
1.0E−05
0.0013
22
49


ERBB2
MSH2
0.25
17
5
37
12
77.3%
75.5%
4.4E−06
0.0073
22
49


BRAF
TNF
0.24
17
5
37
11
77.3%
77.1%
0.0144
6.7E−06
22
48


ABL1
SMAD4
0.24
17
5
38
11
77.3%
77.6%
5.7E−06
0.0102
22
49


SOCS1

0.23
17
5
38
11
77.3%
77.6%
5.8E−06

22
49


ABL1
ATM
0.23
17
5
37
12
77.3%
75.5%
1.6E−05
0.0117
22
49


SKIL
TNF
0.23
18
4
39
10
81.8%
79.6%
0.0301
2.7E−05
22
49


CDK5
PTEN
0.22
17
5
38
11
77.3%
77.6%
1.3E−05
0.0356
22
49


ERBB2
IL8
0.21
17
5
38
11
77.3%
77.6%
5.0E−05
0.0449
22
49


CDK2
SMAD4
0.21
17
5
38
11
77.3%
77.6%
1.9E−05
0.0033
22
49


CDK2
SKIL
0.19
17
5
37
12
77.3%
75.5%
0.0002
0.0086
22
49


GZMA
SKIL
0.13
17
5
37
12
77.3%
75.5%
0.0036
0.0035
22
49





















TABLE 3B










Beast
Normals
Sum







Group Size
69.0%
31.0%
100%



N =
49
22
71







Gene
Mean
Mean
p-val







EGR1
18.8
20.1
1.1E−11



SOCS1
16.4
17.1
5.8E−06



TGFB1
12.4
12.9
9.9E−06



ABL2
19.8
20.4
2.2E−05



TNF
18.1
18.8
7.9E−05



CDK5
18.2
18.8
0.0001



ERBB2
22.1
22.7
0.0001



ABL1
17.9
18.4
0.0002



RHOC
16.0
16.5
0.0002



BAX
15.4
15.8
0.0006



CDK2
19.0
19.4
0.0017



NRAS
16.7
17.1
0.0018



WNT1
21.1
21.8
0.0021



SRC
18.2
18.6
0.0024



MYCL1
18.3
18.7
0.0041



BAD
18.1
18.4
0.0056



FOS
15.3
15.9
0.0063



MYC
17.9
18.3
0.0065



ICAM1
16.8
17.2
0.0067



BCL2
16.9
17.2
0.0088



TIMP1
14.4
14.7
0.0108



TNFRSF10B
17.0
17.4
0.0111



CDKN2A
20.5
20.9
0.0114



NFKB1
16.5
16.8
0.0133



TP53
16.1
16.4
0.0176



SEMA4D
14.2
14.5
0.0201



PLAUR
14.6
15.0
0.0218



THBS1
17.5
18.1
0.0242



IFITM1
8.6
9.0
0.0405



RHOA
11.6
11.9
0.0424



TNFRSF1A
15.2
15.5
0.0505



AKT1
15.1
15.3
0.0507



SERPINE1
20.9
21.4
0.0615



MMP9
14.4
15.0
0.0671



S100A4
13.2
13.4
0.0738



SKIL
18.3
18.0
0.1006



ITGA3
21.6
21.9
0.1038



GZMA
17.3
17.7
0.1053



HRAS
19.9
20.2
0.1110



JUN
20.7
21.1
0.1114



NOTCH2
15.9
16.1
0.1141



IL8
22.0
21.6
0.1276



CDK4
17.6
17.7
0.1294



VHL
17.2
17.4
0.1560



ATM
16.8
16.5
0.1612



NME1
19.3
19.5
0.1768



IL1B
15.6
15.9
0.1784



SKI
17.3
17.5
0.1812



RAF1
14.4
14.6
0.1892



NME4
17.2
17.4
0.1896



TNFRSF10A
20.6
20.8
0.1902



PLAU
24.1
24.4
0.2023



CDKN1A
16.2
16.4
0.2565



G1P3
15.2
15.5
0.2868



ITGA1
21.2
21.4
0.2895



PTCH1
19.8
20.0
0.2897



E2F1
20.1
20.3
0.2934



TNFRSF6
16.4
16.5
0.3200



BRAF
16.7
16.9
0.3219



VEGF
22.7
23.0
0.3420



IL18
21.8
22.0
0.3421



IGFBP3
21.9
22.1
0.3450



MSH2
18.1
17.9
0.3469



COL18A1
23.4
23.7
0.3802



BRCA1
21.3
21.5
0.3833



ITGB1
14.7
14.5
0.3906



PCNA
18.1
18.2
0.4038



CASP8
15.1
15.2
0.5195



CDC25A
23.0
23.1
0.5478



CFLAR
14.6
14.7
0.5518



NOTCH4
24.7
24.9
0.5994



PTEN
14.1
14.0
0.6315



ITGAE
23.7
23.5
0.6404



ANGPT1
21.1
21.2
0.6406



CCNE1
22.9
23.0
0.6670



SMAD4
17.1
17.1
0.6686



IFNG
22.9
22.9
0.8594



RB1
17.6
17.6
0.8655



APAF1
17.4
17.3
0.9248



FGFR2
22.9
22.9
0.9735























TABLE 3C











Predicted








probability


Patient ID
Group
ESR1
NME1
logit
odds
of breast cancer





















BC-014
Breast Cancer
15.38
19.12
33.89
5.3E+14
1.0000


BC-017
Breast Cancer
15.58
20.36
28.55
2.5E+12
1.0000


BC-019
Breast Cancer
16.41
18.69
27.39
7.8E+11
1.0000


BC-006
Breast Cancer
16.80
19.64
21.41
2.0E+09
1.0000


BC-041
Breast Cancer
17.74
18.50
17.79
5.3E+07
1.0000


BC-002
Breast Cancer
16.89
21.44
15.19
3.9E+06
1.0000


BC-059
Breast Cancer
18.30
18.67
12.96
424412.98
1.0000


BC-001
Breast Cancer
18.31
19.26
11.11
67008.02
1.0000


BC-047
Breast Cancer
18.41
19.17
10.59
39697.66
1.0000


BC-036
Breast Cancer
18.41
19.40
9.90
19916.42
0.9999


BC-058
Breast Cancer
19.00
18.16
9.24
10313.91
0.9999


BC-005
Breast Cancer
18.66
19.19
8.59
5364.93
0.9998


BC-043
Breast Cancer
19.05
18.24
8.57
5256.00
0.9998


BC-007
Breast Cancer
18.72
19.28
7.90
2685.56
0.9996


BC-037
Breast Cancer
18.41
20.11
7.64
2085.20
0.9995


BC-056
Breast Cancer
18.83
19.15
7.46
1735.21
0.9994


BC-033
Breast Cancer
19.11
18.66
6.85
944.26
0.9989


BC-050
Breast Cancer
19.05
18.91
6.52
676.82
0.9985


BC-049
Breast Cancer
19.25
18.43
6.45
630.54
0.9984


BC-057
Breast Cancer
18.95
19.22
6.30
545.01
0.9982


BC-031
Breast Cancer
19.28
18.53
5.94
379.32
0.9974


BC-052
Breast Cancer
19.21
18.83
5.53
251.45
0.9960


BC-018
Breast Cancer
19.01
19.38
5.35
210.03
0.9953


BC-055
Breast Cancer
19.13
19.14
5.14
171.52
0.9942


BC-044
Breast Cancer
18.95
19.60
5.11
166.18
0.9940


BC-012
Breast Cancer
18.89
19.81
4.96
142.19
0.9930


BC-032
Breast Cancer
19.34
18.82
4.54
93.46
0.9894


BC-003
Breast Cancer
19.12
19.54
3.99
54.26
0.9819


BC-040
Breast Cancer
19.27
19.19
3.91
50.05
0.9804


BC-035
Breast Cancer
19.32
19.08
3.89
48.72
0.9799


BC-046
Breast Cancer
19.31
19.19
3.63
37.76
0.9742


BC-034
Breast Cancer
19.54
18.89
2.80
16.41
0.9426


BC-015
Breast Cancer
19.03
20.15
2.77
16.02
0.9412


BC-010
Breast Cancer
19.02
20.19
2.77
15.99
0.9412


HN-004-HCG
Normal
19.39
19.33
2.60
13.48
0.9309


BC-054
Breast Cancer
20.04
17.75
2.53
12.59
0.9264


BC-008
Breast Cancer
19.41
19.38
2.34
10.40
0.9123


BC-060
Breast Cancer
19.28
19.71
2.30
9.98
0.9089


BC-038
Breast Cancer
19.50
19.19
2.22
9.17
0.9016


BC-053
Breast Cancer
19.63
18.90
2.08
8.04
0.8894


BC-042
Breast Cancer
19.68
18.89
1.80
6.07
0.8585


BC-004
Breast Cancer
19.06
20.44
1.68
5.37
0.8431


BC-011
Breast Cancer
19.26
19.96
1.65
5.19
0.8385


BC-048
Breast Cancer
19.36
19.76
1.49
4.43
0.8160


HN-050-HCG
Normal
19.41
19.69
1.35
3.87
0.7947


BC045:
Breast Cancer
19.65
19.24
0.94
2.57
0.7199


HN-111-HCG
Normal
19.95
18.62
0.50
1.66
0.6236


BC-039
Breast Cancer
19.55
19.64
0.42
1.53
0.6044


HN-041-HCG
Normal
19.60
19.56
0.29
1.34
0.5731


BC-051
Breast Cancer
20.29
17.92
0.10
1.11
0.5252


BC-009
Breast Cancer
19.44
20.08
−0.10
0.90
0.4739


HN-042-HCG
Normal
19.82
19.18
−0.17
0.84
0.4564


HN-001-HCG
Normal
19.31
20.49
−0.36
0.70
0.4102


BC-016
Breast Cancer
19.74
19.63
−0.97
0.38
0.2739


BC-013
Breast Cancer
19.82
19.47
−1.10
0.33
0.2501


HN-146-HCG
Normal
20.02
19.10
−1.49
0.23
0.1838


HN-125-HCG
Normal
20.17
18.79
−1.70
0.18
0.1539


HN-002-HCG
Normal
19.68
20.03
−1.76
0.17
0.1471


HN-034-HCG
Normal
20.10
19.14
−2.26
0.10
0.0949


HN-120-HCG
Normal
20.27
18.86
−2.67
0.07
0.0645


HN-110-HCG
Normal
20.16
19.27
−3.09
0.05
0.0437


HN-150-HCG
Normal
19.74
20.35
−3.26
0.04
0.0368


HN-103-HCG
Normal
20.53
18.62
−3.88
0.02
0.0202


HN-104-HCG
Normal
20.17
19.50
−3.89
0.02
0.0201


HN-109-HCG
Normal
20.33
19.59
−5.36
0.00
0.0047


HN-022-HCG
Normal
20.04
20.28
−5.36
0.00
0.0047


HN-133-HCG
Normal
20.36
19.67
−5.83
0.00
0.0029


HN-028-HCG
Normal
20.61
19.20
−6.33
0.00
0.0018


HN-033-HCG
Normal
20.53
19.89
−7.86
0.00
0.0004


HN-032-HCG
Normal
20.60
19.77
−7.99
0.00
0.0003


HN-118-HCG
Normal
20.65
19.72
−8.22
0.00
0.0003


























TABLE 4A

















total used








Normal
Breast


(excludes







N =
22
48


missing)



















Entropy
#normal
#normal
#b
#b
Correct
Correct


# nor-



2-gene models
R-sq
Correct
FALSE
Correct
FALSE
Classification
Classification
p-val 1
p-val 2
mals
# disease






















NR4A2
TGFB1
0.42
18
4
41
7
81.8%
85.4%
4.7E−05
1.9E−09
22
48


CREBBP
TGFB1
0.38
18
4
39
9
81.8%
81.3%
0.0004
1.7E−08
22
48


EGR1
TGFB1
0.36
18
4
39
9
81.8%
81.3%
0.0009
0.0061
22
48


EP300
TGFB1
0.33
17
5
37
11
77.3%
77.1%
0.0035
1.9E−07
22
48


TGFB1
TOPBP1
0.31
17
5
37
11
77.3%
77.1%
4.2E−07
0.0082
22
48


MAPK1
TGFB1
0.29
17
5
38
10
77.3%
79.2%
0.0297
2.3E−06
22
48


CDKN2D
TGFB1
0.29
17
5
36
12
77.3%
75.0%
0.0313
7.0E−07
22
48


S100A6
TGFB1
0.28
17
5
36
12
77.3%
75.0%
0.0327
6.9E−07
22
48





















TABLE 4B










Breast
Normals
Sum







Group Size
68.6%
31.4%
100%



N =
48
22
70







Gene
Mean
Mean
p-val







EGR1
19.11
20.07
1.1E−06



TGFB1
12.39
12.95
6.9E−06



EGR2
23.56
24.29
0.0023



SRC
18.15
18.58
0.0024



FOS
15.31
15.86
0.0051



ICAM1
16.74
17.18
0.0063



SMAD3
17.72
18.12
0.0072



NFKB1
16.47
16.84
0.0119



EGR3
22.78
23.34
0.0152



TP53
16.15
16.44
0.0181



THBS1
17.47
18.11
0.0209



CEBPB
14.56
14.86
0.0514



SERPINE1
20.90
21.42
0.0579



MAP2K1
15.79
16.01
0.0633



NAB2
19.95
20.15
0.0785



MAPK1
14.66
14.86
0.1080



NFATC2
15.95
16.17
0.1090



PDGFA
19.45
19.80
0.1117



JUN
20.77
21.10
0.1320



ALOX5
15.59
15.93
0.1459



PLAU
24.08
24.44
0.1716



EP300
16.38
16.60
0.1975



TNFRSF6
16.36
16.51
0.2063



RAF1
14.39
14.57
0.2205



CREBBP
15.09
15.23
0.2831



TOPBP1
18.30
18.11
0.3555



NAB1
17.02
17.12
0.3886



NR4A2
21.30
21.12
0.3937



PTEN
14.09
14.00
0.5885



CDKN2D
14.91
14.96
0.6209



S100A6
14.34
14.27
0.7017



CCND2
16.97
16.87
0.7679



























TABLE 5A

















total used








Normal
Breast


(excludes



En-



N =
22
48


missing)


















2-gene models and
tropy
#normal
#normal
#bc
#bc
Correct
Correct


# nor-
# dis-


1-gene models
R-sq
Correct
FALSE
Correct
FALSE
Classification
Classification
p-val 1
p-val 2
mals
ease






















EGR1
PLEK2
0.85
20
0
46
2
100.0%
95.8%
1.9E−15
4.1E−07
20
48


EGR1
SIAH2
0.78
19
1
46
2
95.0%
95.8%
4.0E−15
8.0E−06
20
48


EGR1
IGF2BP2
0.75
20
1
45
3
95.2%
93.8%
4.7E−15
3.4E−05
21
48


EGR1
NEDD4L
0.75
19
1
45
3
95.0%
93.8%
5.6E−15
3.2E−05
20
48


EGR1
NUDT4
0.73
19
2
44
4
90.5%
91.7%
2.7E−14
9.4E−05
21
48


EGR1
XK
0.71
20
1
45
3
95.2%
93.8%
1.5E−14
0.0002
21
48


DLC1
EGR1
0.69
20
1
45
3
95.2%
93.8%
0.0006
2.2E−14
21
48


BAX
EGR1
0.67
19
3
42
6
86.4%
87.5%
0.0018
7.2E−12
22
48


BCAM
EGR1
0.66
19
2
45
3
90.5%
93.8%
0.0022
1.4E−13
21
48


CDH1
EGR1
0.66
20
2
44
4
90.9%
91.7%
0.0028
3.3E−14
22
48


EGR1
SPARC
0.64
18
3
42
6
85.7%
87.5%
4.2E−13
0.0057
21
48


EGR1
IKBKE
0.62
18
3
42
6
85.7%
87.5%
2.7E−12
0.0122
21
48


CEACAM1
EGR1
0.62
19
2
42
6
90.5%
87.5%
0.0132
4.3E−13
21
48


EGR1
GADD45A
0.61
20
2
44
4
90.9%
91.7%
3.7E−13
0.0414
22
48


EGR1
SERPING1
0.60
20
2
43
5
90.9%
89.6%
7.9E−13
0.0445
22
48


ANLN
EGR1
0.60
19
3
42
6
86.4%
87.5%
0.0458
6.1E−13
22
48


EGR1
S100A4
0.60
20
2
43
5
90.9%
89.6%
2.0E−12
0.0491
22
48


EGR1

0.56
19
3
41
7
86.4%
85.4%
3.1E−12

22
48


CD97
TGFB1
0.39
16
4
39
9
80.0%
81.3%
0.0002
1.8E−08
20
48


BAX
MSH6
0.36
16
4
37
11
80.0%
77.1%
8.9E−08
9.5E−06
20
48


MME
TGFB1
0.35
16
5
37
11
76.2%
77.1%
0.0017
7.8E−08
21
48


CEACAM1
TGFB1
0.35
16
5
38
10
76.2%
79.2%
0.0019
6.1E−08
21
48


ING2
SRF
0.34
16
5
39
9
76.2%
81.3%
3.9E−05
1.8E−07
21
48


NCOA1
TGFB1
0.34
17
5
37
11
77.3%
77.1%
0.0021
4.2E−07
22
48


MAPK14
TGFB1
0.33
15
5
36
12
75.0%
75.0%
0.0031
2.4E−07
20
48


GNB1
TGFB1
0.33
16
5
37
11
76.2%
77.1%
0.0051
8.7E−07
21
48


CASP9
TGFB1
0.33
15
5
37
11
75.0%
77.1%
0.0043
4.4E−07
20
48


GSK3B
TGFB1
0.32
16
5
38
10
76.2%
79.2%
0.0065
1.6E−07
21
48


MSH6
NRAS
0.32
16
4
38
10
80.0%
79.2%
5.1E−05
5.6E−07
20
48


TLR2
TXNRD1
0.32
17
4
40
8
81.0%
83.3%
2.2E−07
0.0002
21
48


ING2
TGFB1
0.32
17
4
38
10
81.0%
79.2%
0.0089
5.4E−07
21
48


GSK3B
TNF
0.31
17
4
39
9
81.0%
81.3%
0.0028
2.5E−07
21
48


BAX
ING2
0.31
16
5
37
11
76.2%
77.1%
8.0E−07
0.0001
21
48


HSPA1A
TGFB1
0.31
17
5
36
12
77.3%
75.0%
0.0110
4.7E−07
22
48


DLC1
TNF
0.31
16
5
37
11
76.2%
77.1%
0.0044
3.7E−07
21
48


MNDA
TGFB1
0.30
16
4
37
11
80.0%
77.1%
0.0112
1.1E−06
20
48


SERPINA1
TGFB1
0.30
16
4
38
10
80.0%
79.2%
0.0121
2.2E−06
20
48


CEACAM1
TNF
0.30
16
5
37
11
76.2%
77.1%
0.0049
4.9E−07
21
48


PTEN
TLR2
0.30
17
4
38
10
81.0%
79.2%
0.0003
4.9E−07
21
48


CCL5
PLEK2
0.30
15
5
37
11
75.0%
77.1%
2.2E−05
0.0002
20
48


TEGT
TGFB1
0.30
17
5
36
12
77.3%
75.0%
0.0151
2.2E−06
22
48


TGFB1
TXNRD1
0.30
17
4
38
10
81.0%
79.2%
6.1E−07
0.0250
21
48


CCL3
FOS
0.29
16
4
37
11
80.0%
77.1%
4.7E−05
0.0162
20
48


IRF1
TGFB1
0.29
16
5
36
12
76.2%
75.0%
0.0334
4.3E−06
21
48


PLEK2
TNF
0.29
15
5
36
12
75.0%
75.0%
0.0073
3.8E−05
20
48


APC
TNF
0.29
16
5
39
9
76.2%
81.3%
0.0107
1.3E−06
21
48


PLEK2
TGFB1
0.29
16
4
36
12
80.0%
75.0%
0.0271
4.2E−05
20
48


SP1
TGFB1
0.28
16
5
36
12
76.2%
75.0%
0.0435
1.8E−05
21
48


ING2
TNF
0.28
18
3
39
9
85.7%
81.3%
0.0122
2.4E−06
21
48


MME
TNF
0.28
17
4
37
11
81.0%
77.1%
0.0126
1.6E−06
21
48


ING2
VIM
0.28
16
5
37
11
76.2%
77.1%
5.9E−05
2.5E−06
21
48


CTNNA1
TGFB1
0.28
17
5
37
11
77.3%
77.1%
0.0379
5.1E−06
22
48


CCL5
IGF2BP2
0.28
15
5
36
12
75.0%
75.0%
7.2E−06
0.0005
20
48


CCL5
NUDT4
0.27
16
4
38
10
80.0%
79.2%
1.8E−05
0.0006
20
48


MME
TLR2
0.27
17
4
37
11
81.0%
77.1%
0.0013
2.6E−06
21
48


CCL3
TLR2
0.27
16
5
37
11
76.2%
77.1%
0.0014
0.0046
21
48


HMOX1
ING2
0.27
16
5
37
11
76.2%
77.1%
4.2E−06
0.0027
21
48


SRF
TXNRD1
0.26
17
4
39
9
81.0%
81.3%
3.0E−06
0.0016
21
48


TNF
ZNF350
0.26
17
4
39
9
81.0%
81.3%
4.0E−06
0.0394
21
48


CCL3
TNF
0.26
16
5
37
11
76.2%
77.1%
0.0488
0.0097
21
48


GSK3B
SRF
0.25
16
5
37
11
76.2%
77.1%
0.0023
3.7E−06
21
48


G6PD
IQGAP1
0.25
17
5
37
11
77.3%
77.1%
6.8E−06
0.0014
22
48


HMOX1
MSH6
0.25
15
5
36
12
75.0%
75.0%
1.4E−05
0.0062
20
48


CCL3
UBE2C
0.24
17
4
38
10
81.0%
79.2%
0.0050
0.0202
21
48


CCL3
G6PD
0.24
16
5
37
11
76.2%
77.1%
0.0038
0.0212
21
48


TLR2
UBE2C
0.24
16
5
37
11
76.2%
77.1%
0.0059
0.0069
21
48


ADAM17
TLR2
0.23
15
5
36
12
75.0%
75.0%
0.0091
1.2E−05
20
48


AXIN2
BAX
0.23
16
5
38
10
76.2%
79.2%
0.0055
1.7E−05
21
48


CCL3
HMOX1
0.23
16
5
37
11
76.2%
77.1%
0.0221
0.0404
21
48


CNKSR2
MYC
0.22
16
5
36
12
76.2%
75.0%
0.0009
2.4E−05
21
48


BAX
MSH2
0.22
17
5
36
12
77.3%
75.0%
1.8E−05
0.0047
22
48


CNKSR2
SRF
0.22
16
5
37
11
76.2%
77.1%
0.0100
2.5E−05
21
48


CASP3
NRAS
0.22
15
5
36
12
75.0%
75.0%
0.0044
3.8E−05
20
48


APC
BAX
0.22
16
5
36
12
76.2%
75.0%
0.0080
2.5E−05
21
48


CCL3
PLEK2
0.22
15
5
38
10
75.0%
79.2%
0.0009
0.0372
20
48


MNDA
TLR2
0.22
15
5
37
11
75.0%
77.1%
0.0209
4.8E−05
20
48


BAX
CASP3
0.22
16
4
38
10
80.0%
79.2%
4.3E−05
0.0062
20
48


BAX
ZNF350
0.21
16
5
36
12
76.2%
75.0%
3.2E−05
0.0109
21
48


CCL5
NEDD4L
0.21
16
4
36
12
80.0%
75.0%
4.0E−05
0.0108
20
48


CCL5
XK
0.21
16
4
37
11
80.0%
77.1%
6.0E−05
0.0108
20
48


HMOX1
TLR2
0.21
16
5
36
12
76.2%
75.0%
0.0250
0.0497
21
48


MAPK14
MYD88
0.21
15
5
36
12
75.0%
75.0%
0.0048
5.2E−05
20
48


BCAM
CCL5
0.21
15
5
36
12
75.0%
75.0%
0.0124
7.3E−05
20
48


BAX
FOS
0.20
16
5
36
12
76.2%
75.0%
0.0022
0.0145
21
48


MME
NRAS
0.20
16
5
36
12
76.2%
75.0%
0.0091
5.8E−05
21
48


CNKSR2
NRAS
0.19
16
5
36
12
76.2%
75.0%
0.0136
8.4E−05
21
48


ADAM17
BAX
0.19
15
5
36
12
75.0%
75.0%
0.0234
9.3E−05
20
48


ING2
MYC
0.19
16
5
37
11
76.2%
77.1%
0.0048
0.0002
21
48


ESR1
MYC
0.18
16
5
37
11
76.2%
77.1%
0.0072
0.0001
21
48


ACPP
MSH6
0.18
15
5
36
12
75.0%
75.0%
0.0003
0.0153
20
48


IKBKE
MSH6
0.17
15
5
36
12
75.0%
75.0%
0.0004
0.0012
20
48


C1QA
FOS
0.16
15
5
36
12
75.0%
75.0%
0.0174
0.0177
20
48


C1QB
FOS
0.16
15
5
36
12
75.0%
75.0%
0.0198
0.0368
20
48


C1QB
IGF2BP2
0.15
17
4
38
10
81.0%
79.2%
0.0011
0.0123
21
48


NUDT4
RP51077B9.4
0.12
15
5
36
12
75.0%
75.0%
0.0103
0.0135
20
48


ELA2
SIAH2
0.09
15
5
36
12
75.0%
75.0%
0.0277
0.0249
20
48





















TABLE 5B










Breast
Normals
Sum







Group Size
68.6%
31.4%
100%



N =
48
22
70







Gene
Mean
Mean
p-val







EGR1
18.8
20.1
3.1E−12



TGFB1
12.4
12.9
6.9E−06



TNF
18.1
18.8
7.2E−05



CCL3
19.7
20.4
0.0001



HMOX1
15.7
16.3
0.0002



TLR2
15.7
16.2
0.0004



UBE2C
20.6
21.1
0.0004



SRF
16.0
16.5
0.0005



G6PD
15.5
16.0
0.0007



BAX
15.4
15.8
0.0007



CCL5
11.9
12.5
0.0010



NRAS
16.7
17.1
0.0023



TIMP1
14.5
14.9
0.0035



CTSD
12.9
13.4
0.0036



MTA1
19.3
19.7
0.0036



MYD88
14.3
14.7
0.0045



ACPP
17.7
18.2
0.0048



FOS
15.3
15.9
0.0051



VIM
11.2
11.6
0.0052



MYC
17.9
18.3
0.0054



IFI16
14.2
14.6
0.0079



MTF1
17.6
18.1
0.0081



HMGA1
15.5
15.9
0.0088



C1QA
19.8
20.6
0.0088



C1QB
20.2
21.0
0.0089



ST14
17.4
17.9
0.0091



PLEK2
18.6
18.0
0.0092



PLXDC2
16.5
16.9
0.0155



SP1
15.6
16.0
0.0163



XRCC1
18.3
18.6
0.0180



LARGE
21.8
22.3
0.0191



DAD1
15.2
15.4
0.0314



ZNF185
16.9
17.3
0.0363



ITGAL
14.5
14.8
0.0400



MEIS1
21.8
22.2
0.0417



NCOA1
16.1
16.4
0.0424



IKBKE
16.6
16.9
0.0425



DIABLO
18.4
18.6
0.0443



NUDT4
16.3
16.0
0.0448



PTPRC
12.2
12.5
0.0462



HOXA10
22.3
22.9
0.0518



ETS2
17.2
17.6
0.0521



TNFRSF1A
15.2
15.5
0.0530



CTNNA1
16.8
17.1
0.0532



GNB1
13.3
13.6
0.0542



TEGT
12.4
12.6
0.0546



RP51077B9.4
16.3
16.5
0.0561



MMP9
14.4
15.0
0.0576



NBEA
22.2
21.6
0.0601



CA4
18.6
19.0
0.0620



IRF1
12.7
12.9
0.0637



IL8
22.1
21.6
0.0674



S100A11
11.1
11.4
0.0699



S100A4
13.2
13.4
0.0832



SERPINE1
20.8
21.2
0.0871



USP7
15.2
15.4
0.0875



SIAH2
13.9
13.5
0.1109



SERPINA1
12.5
12.8
0.1111



IGF2BP2
16.0
15.7
0.1133



LTA
19.2
19.4
0.1249



PTGS2
17.3
17.5
0.1363



CXCL1
19.8
20.0
0.1574



PLAU
24.1
24.4
0.1716



SPARC
14.7
15.1
0.1767



ING2
19.7
19.6
0.1828



PTPRK
21.7
22.1
0.1863



IQGAP1
13.9
14.1
0.2302



BCAM
20.7
20.2
0.2343



MNDA
12.7
12.9
0.2436



MSH6
19.7
19.5
0.2443



CASP9
18.1
18.2
0.2445



SERPING1
18.0
18.4
0.2458



HSPA1A
14.6
14.8
0.2542



ELA2
21.0
21.4
0.2689



LGALS8
17.4
17.5
0.2782



XK
18.0
17.7
0.2950



CASP3
20.5
20.3
0.2952



RBM5
15.9
16.1
0.3072



MSH2
18.2
17.9
0.3114



MME
15.5
15.3
0.3138



CNKSR2
21.5
21.4
0.3152



CCR7
15.0
14.9
0.3166



IGFBP3
21.9
22.1
0.3349



VEGF
22.7
23.0
0.3520



CD59
17.7
17.8
0.3572



APC
18.2
18.0
0.3611



AXIN2
19.5
19.3
0.3746



ANLN
22.4
22.5
0.3748



MAPK14
15.3
15.4
0.3755



ZNF350
19.6
19.4
0.3954



E2F1
20.1
20.2
0.4227



POV1
18.1
18.3
0.4503



NEDD4L
18.5
18.4
0.4645



ESR1
22.1
22.0
0.4720



CD97
12.9
13.0
0.5122



CEACAM1
18.4
18.5
0.5495



PTEN
14.1
14.0
0.5885



TNFSF5
17.8
17.9
0.5957



ESR2
23.9
24.1
0.6225



ADAM17
18.4
18.4
0.6449



TXNRD1
16.9
17.0
0.6517



MLH1
18.0
17.9
0.6927



CAV1
23.7
23.7
0.8068



GSK3B
16.1
16.0
0.8446



DLC1
23.5
23.4
0.8808



CDH1
20.4
20.4
0.9634



GADD45A
19.2
19.2
0.9822























TABLE 5C











Predicted








probability


Patient ID
Group
EGR1
PLEK2
logit
odds
of breast cancer





















BC-014:XS:200073044
Breast Cancer
15.38
19.13
61.68
6.1E+26
1.0000


BC-017:XS:200073047
Breast Cancer
15.58
18.39
55.64
1.5E+24
1.0000


BC-019:XS:200073049
Breast Cancer
16.41
18.54
45.38
5.1E+19
1.0000


BC-002:XS:200072710
Breast Cancer
16.89
17.40
33.62
4.0E+14
1.0000


BC-041:XS:200073061
Breast Cancer
17.74
19.37
31.63
5.4E+13
1.0000


BC-006:XS:200072714
Breast Cancer
16.80
16.70
31.40
4.3E+13
1.0000


BC-001:XS:200072709
Breast Cancer
18.31
19.55
25.02
7.4E+10
1.0000


BC-047:XS:200073067
Breast Cancer
18.41
19.32
22.56
6.3E+09
1.0000


BC-059:XS:200073079
Breast Cancer
18.30
18.49
20.09
5.3E+08
1.0000


BC-036:XS:200073056
Breast Cancer
18.41
18.52
18.85
1.5E+08
1.0000


BC-033:XS:200073053
Breast Cancer
19.11
20.30
17.96
6.3E+07
1.0000


BC-056:XS:200073076
Breast Cancer
18.83
19.44
17.64
4.6E+07
1.0000


BC-037:XS:200073057
Breast Cancer
18.41
18.19
17.18
2.9E+07
1.0000


BC-018:XS:200073048
Breast Cancer
19.01
19.84
17.10
2.7E+07
1.0000


BC-005:XS:200072713
Breast Cancer
18.66
18.77
16.62
1.6E+07
1.0000


BC-007:XS:200072715
Breast Cancer
18.72
18.68
15.46
5.2E+06
1.0000


BC-012:XS:200073042
Breast Cancer
18.89
19.00
14.74
2.5E+06
1.0000


BC-010:XS:200072718
Breast Cancer
19.02
19.22
14.08
1.3E+06
1.0000


BC-050:XS:200073070
Breast Cancer
19.05
19.26
13.87
1.1E+06
1.0000


BC-043:XS:200073063
Breast Cancer
19.05
19.24
13.73
9.2E+05
1.0000


BC-049:XS:200073069
Breast Cancer
19.25
19.38
11.72
1.2E+05
1.0000


BC-035:XS:200073055
Breast Cancer
19.32
19.35
10.64
41935.36
1.0000


BC-055:XS:200073075
Breast Cancer
19.13
18.82
10.63
41438.02
1.0000


BC-003:XS:200072719
Breast Cancer
19.12
18.56
9.59
14614.01
0.9999


BC-008:XS:200072716
Breast Cancer
19.41
19.22
8.93
7526.80
0.9999


BC-034:XS:200073054
Breast Cancer
19.54
19.52
8.54
5121.02
0.9998


BC-058:XS:200073078
Breast Cancer
19.00
17.93
8.30
4007.06
0.9998


BC-052:XS:200073072
Breast Cancer
19.21
18.48
8.04
3088.62
0.9997


BC-040:XS:200073060
Breast Cancer
19.27
18.62
7.86
2596.35
0.9996


BC-057:XS:200073077
Breast Cancer
18.95
17.70
7.77
2371.72
0.9996


BC-044:XS:200073064
Breast Cancer
18.95
17.65
7.53
1864.74
0.9995


BC-053:XS:200073073
Breast Cancer
19.63
19.55
7.51
1817.97
0.9995


BC-011:XS:200073041
Breast Cancer
19.26
18.47
7.36
1578.03
0.9994


BC-015:XS:200073045
Breast Cancer
19.03
17.64
6.40
603.19
0.9983


BC-009:XS:200072717
Breast Cancer
19.44
18.55
5.31
203.13
0.9951


BC-004:XS:200072712
Breast Cancer
19.06
17.43
5.04
154.64
0.9936


BC-046:XS:200073066
Breast Cancer
19.31
17.92
4.05
57.40
0.9829


BC-048:XS:200073068
Breast Cancer
19.36
18.04
3.98
53.35
0.9816


BC-031:XS:200073051
Breast Cancer
19.28
17.80
3.90
49.50
0.9802


BC-038:XS:200073058
Breast Cancer
19.50
18.27
3.20
24.48
0.9608


BC-032:XS:200073052
Breast Cancer
19.34
17.83
3.19
24.30
0.9605


BC-042:XS:200073062
Breast Cancer
19.68
18.73
3.06
21.43
0.9554


BC-039:XS:200073059
Breast Cancer
19.55
18.25
2.47
11.78
0.9217


BC-045:XS:200073065
Breast Cancer
19.65
18.48
2.27
9.70
0.9065


BC-051:XS:200073071
Breast Cancer
20.29
20.24
2.05
7.79
0.8863


BC-013:XS:200073043
Breast Cancer
19.82
18.83
1.63
5.10
0.8360


HN-041-XS:200073106
Normal
19.60
18.18
1.49
4.42
0.8154


HN-004-XS:200072925
Normal
19.39
17.40
0.57
1.76
0.6382


BC-060:XS:200073080
Breast Cancer
19.28
17.02
0.27
1.32
0.5683


BC-016:XS:200073046
Breast Cancer
19.74
17.91
−1.61
0.20
0.1666


HN-125-XS:200073136
Normal
20.17
19.12
−1.65
0.19
0.1611


HN-110-XS:200073123
Normal
20.16
18.97
−2.22
0.11
0.0983


HN-111-XS:200073124
Normal
19.95
18.28
−2.71
0.07
0.0621


HN-050-XS:200073113
Normal
19.41
16.66
−3.13
0.04
0.0417


HN-022-XS:200072948
Normal
20.04
18.28
−3.93
0.02
0.0193


HN-001-XS:200072922
Normal
19.31
16.18
−4.06
0.02
0.0169


HN-002-XS:200072923
Normal
19.68
17.21
−4.10
0.02
0.0163


HN-042-XS:200073107
Normal
19.82
17.59
−4.15
0.02
0.0156


HN-103-XS:200073116
Normal
20.53
19.55
−4.32
0.01
0.0131


HN-034-XS:200073099
Normal
20.10
18.13
−5.41
0.00
0.0045


HN-118-XS:200073131
Normal
20.65
19.62
−5.58
0.00
0.0037


HN-120-XS:200073133
Normal
20.27
18.50
−5.91
0.00
0.0027


HN-028-XS:200073094
Normal
20.61
19.24
−6.92
0.00
0.0010


HN-133-XS:200073137
Normal
20.36
17.86
−10.03
0.00
0.0000


HN-104-XS:200073117
Normal
20.17
17.33
−10.07
0.00
0.0000


HN-109-XS:200073122
Normal
20.33
17.75
−10.18
0.00
0.0000


HN-150-XS:200073139
Normal
19.74
16.03
−10.56
0.00
0.0000


HN-033-XS:200073098
Normal
20.53
18.04
−11.50
0.00
0.0000








Claims
  • 1. A method for evaluating the presence of breast cancer in a subject based on a sample from the subject, the sample providing a source of RNAs, comprising: a) determining a quantitative measure of the amount of at least one constituent of any constituent of any one table selected from the group consisting of Tables 1, 2, 3, 4, and 5 as a distinct RNA constituent in the subject sample, wherein such measure is obtained under measurement conditions that are substantially repeatable and the constituent is selected so that measurement of the constituent distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy; andb) comparing the quantitative measure of the constituent in the subject sample to a reference value.
  • 2. A method for assessing or monitoring the response to therapy in a subject having breast cancer based on a sample from the subject, the sample providing a source of RNAs, comprising: a) determining a quantitative measure of the amount of at least one constituent of any constituent of Tables 1, 2, 3, 4, and 5 as a distinct RNA constituent, wherein such measure is obtained under measurement conditions that are substantially repeatable to produce subject data set; andb) comparing the subject data set to a baseline data set.
  • 3. A method for monitoring the progression of breast cancer in a subject, based on a sample from the subject, the sample providing a source of RNAs, comprising: a) determining a quantitative measure of the amount of at least one constituent of any constituent of Tables 1, 2, 3, 4, and 5 as a distinct RNA constituent in a sample obtained at a first period of time, wherein such measure is obtained under measurement conditions that are substantially repeatable to produce a first subject data set;b) determining a quantitative measure of the amount of at least one constituent of any constituent of Tables 1, 2, 3, 4, and 5 as a distinct RNA constituent in a sample obtained at a second period of time, wherein such measure is obtained under measurement conditions that are substantially repeatable to produce a second subject data set; andc) comparing the first subject data set and the second subject data set.
  • 4. A method for determining a breast cancer profile based on a sample from a subject known to have breast cancer, the sample providing a source of RNAs, the method comprising: a) using amplification for measuring the amount of RNA in a panel of constituents including at least 1 constituent from Tables 1, 2, 3, 4, and 5 andb) arriving at a measure of each constituent,wherein the profile data set comprises the measure of each constituent of the panel and wherein amplification is performed under measurement conditions that are substantially repeatable.
  • 5. The method of claim 1, wherein said constituent is selected from the group consisting of EGR1, IL18BP and SOCS1
  • 6. The method of claim 1, comprising measuring at least two constituents from a) Table 1, wherein the first constituent is selected from the group consisting of ABCB1, ATM, BAX, BCL2, BRCA1, BRCA2, CASP8, CCND1, CDH1, CDK4, CDKN1B, CRABP2, CTNNB1, CTSD, EGR1, HPGD, ITGA6, MTA1, TGFB1, and TP53; and the second constituent is selected from the group consisting of any other constituents selected from Table 1, wherein the constituent is selected so that measurement of the constituent distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy;b) Table 2, wherein the first constituent is selected from the group consisting of ADAM17, C1QA, CCR3, CCR5, CD19, CD86, CXCL1, DPP4, EGR1, HSPA1A, IL10, IL18BP, IL1R1, IL8, IRF1, and TLR2 and the second constituent is selected from the group consisting of any other constituents selected from Table 2, wherein the constituent is selected so that measurement of the constituent distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy;c) Table 3 wherein the first constituent is selected from the group consisting of ABL1, ABL2, AKT1, ATM, BAD, BAX, BCL2, BRAF, CASP8, CCNE1, CDK2, CDK5, CDKN1A, CDKN2A, EGR1, ERBB2, FOS, GZMA, NOTCH2, NRAS, PLAUR, SKIL, SMAD4, and TGFB1 and the second constituent is selected from the group consisting of any other constituents selected from Table 3, wherein the constituent is selected so that measurement of the constituent distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy;d) Table 4 wherein the first constituent is selected from the group consisting of CDKN2D, CREBBP, EGR1, EP300, MAPK1, NR4A2, S100A6, and TGFB1 and the second constituent is selected from the group consisting of and the second constituent is TGFB1 or TOPBP1, wherein the constituent is selected so that measurement of the constituent distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy; ande) Table 5 wherein the first constituent is selected from the group consisting of ACPP, ADAM17, ANLN, APC, AXIN2, BAX, BCAM, C1QA, C1QB, CASP3, CASP9, CCL3, CCL5, CD97, CDH1, CEACAM1, CNKSR2, CTNNA1, DLC1, EGR1, ELA2, ESR1, G6PD, GNB1, GSK3B, HMOX1, HSPA1A, IKBKE, ING2, IRF1, MAPK14, MME, MNDA, MSH6, NCOA1, NUDT4, PLEK2, PTEN, SERPINA1, SP1, SRF, TEGT, TGFB1, TLR2, and TNF and the second constituent is selected from the group consisting of any other constituents selected from Table 1, wherein the constituent is selected so that measurement of the constituent distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy.
  • 7. The method of claim 1, comprising measuring at least three constituents from a) Table 1, whereini) the first constituent is selected from the group consisting of ABCB1, ATBF1, ATM, BAX, BCL2, BRCA1, BRCA2, C3, CASP8, CASP9, CCND1, CCNE1, CDK4, CDKN1A, CDKN1B, CRABP2, CTNNB1, CTSB, CTSD, DLC1, EGR1, EIF4E, ERBB2, FOS, GADD45A, GNB2L1, HPGD, ICAM1, IFITM3, ILF2, ING1, ITGA6, ITGB3, MCM7, MDM2, MGMT, MTA1, MUC1, MYC, MYCBP, NFKB1, PI3, PTGS2, RB1, RP51077B9.4, RPS3, TGFB1, and TNF;ii) the second constituent is selected from the group consisting of BAX, C3, CASP9, CCND1, CDK4, CDKN1B, CRABP2, CTSB, CTSD, DLC1, EGR1, EIF4E, ERBB2, FOS, GADD45A, GNB2L1, GNB2L1, HPGD, ICAM1, IFITM3, IGF2, IL8, ILF2, ING1, ITGA6, LAMB2, MCM7, MDM2, MGMT, MMP9, MTA1, MUC1, MYBL2, MYC, MYCBP, NCOA1, NFKB1, NME1, PCNA, PI3, PITRM1, PSMB5, PSMD1, PTGS2, RB1, RP51077B9.4, RPL13A, RPS3, SLPI, TGFB1, TGFBR1, THBS1, TIMP1, TNF, TP53, USP10, and VEZF1; andiii) the third constituent is any other constituent selected from Table 1, wherein the each constituent is selected so that measurement of the constituents distinguishes between a normal subject and a breast cancer-diagnosed subject in a reference population with at least 75% accuracy.
  • 8. The method of claim 1, wherein the combination of constituents are selected according to any of the models enumerated in Tables 1A, 2A, 3A, 4A, or 5A.
  • 9. The method of claim 1 wherein said reference value is an index value.
  • 10. The method of claim 2, wherein said therapy is immunotherapy.
  • 11. The method of claim 10, wherein said constituent is selected from Table 6.
  • 12. The method of claim 2, wherein when the baseline data set is derived from a normal subject a similarity in the subject data set and the baseline date set indicates that said therapy is efficacious.
  • 13. The method of claim 2, wherein when the baseline data set is derived from a subject known to have breast cancer a similarity in the subject data set and the baseline date set indicates that said therapy is not efficacious.
  • 14. The method of claim 1, wherein expression of said constituent in said subject is increased compared to expression of said constituent in a normal reference sample.
  • 15. The method of claim 1, wherein expression of said constituent in said subject is decreased compared to expression of said constituent in a normal reference sample.
  • 16. The method of claim 1, wherein the sample is selected from the group consisting of blood, a blood fraction, a body fluid, a cells and a tissue.
  • 17. The method of claim 1, wherein the measurement conditions that are substantially repeatable are within a degree of repeatability of better than ten percent.
  • 18. The method of claim 1, wherein the measurement conditions that are substantially repeatable are within a degree of repeatability of better than five percent.
  • 19. The method of claim 1, wherein the measurement conditions that are substantially repeatable are within a degree of repeatability of better than three percent.
  • 20. The method of claim 1, wherein efficiencies of amplification for all constituents are substantially similar.
  • 21. The method of claim 1, wherein the efficiency of amplification for all constituents is within ten percent.
  • 22. The method of claim 1, wherein the efficiency of amplification for all constituents is within five percent.
  • 23. The method of claim 1, wherein the efficiency of amplification for all constituents is within three percent.
  • 24. A kit for detecting breast cancer in a subject, comprising at least one reagent for the detection or quantification of any constituent measured according to claim 1 and instructions for using the kit.
REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 60/922,341 filed Apr. 5, 2007 and U.S. Provisional Application No. 60/962,659 filed Jul. 30, 2007, the contents of which are incorporated by reference in their entirety.

PCT Information
Filing Document Filing Date Country Kind 371c Date
PCT/US07/23385 11/6/2007 WO 00 4/12/2010
Provisional Applications (2)
Number Date Country
60922341 Apr 2007 US
60962659 Jul 2007 US