GENE EXPRESSION PANEL FOR BREAST CANCER PROGNOSIS

Abstract
The invention described in the application relates to a panel of gene expression markers for node-negative, ER-positive, HER2-negative breast cancer patients. The invention thus provides methods and compositions, e.g., kits and/or microarrays, for evaluating gene expression levels of the markers and methods of using such gene expression levels to evaluate the likelihood of relapse of a node-negative, ER-positive, HER2-negative breast cancer patient. Such information can be used in determining treatment options for patients.
Description
REFERENCE TO A SEQUENCE LISTING SUBMITTED AS AN ASCII TEXT FILE

The Sequence Listing written in file SEQTXT 77429-871826-010220US.txt, created on Apr. 4, 2013, 332,697 bytes, machine format IBM-PC, MS-Windows operating system, is hereby incorporated by reference in its entirety for all purposes.


BACKGROUND OF THE INVENTION

Large randomized trials have shown that chemotherapy administered in the perioperative setting (e.g., adjuvant chemotherapy) can cure patients otherwise destined to recur with systemic, incurable cancer (1). Once this recurrence has happened, the same chemotherapy is not curative. Therefore, the adjuvant window is a privileged period of time, when the decision to administer additional therapy or not, as well as the type, duration and intensity of such therapy takes center stage. Node-negative, estrogen receptor (ER)-positive, HER2-negative patients generally show a favorable prognosis when treated with adjuvant hormonal therapy only. However, because an unknown subset of these patients develops recurrences, most are currently treated not only with hormonal therapy but also cytotoxic chemotherapy, even though it is probably unnecessary for most. Our goal was to stratify these patients into those that are most or least likely to develop a recurrence within 10 years after surgery. Our approach was to develop a multi-gene transcription-level-based classifier of 10-year-relapse (disease recurrence within 10 years) using a large database of existing, publicly available microarray datasets. The probability of relapse and relapse risk score group reported by our method can be used to assign systemic chemotherapy to only those patients most likely to benefit from it.


BRIEF SUMMARY OF THE INVENTION

The present invention is based, in part, on the identification of a panel of gene expression markers for node-negative, ER-positive, HER2-negative breast cancer patients. The probability of relapse and relapse risk score group using the panel of gene expression markers of the invention can be used to assign systemic chemotherapy to only those patients most likely to benefit from it.


The invention can be used on tissue from LN−, ER+, HER2−breast cancer patients by any assay where transcript levels (or their expression products) of primary genes (or their alternate genes) in the Random Forest Relapse Score (RFRS) signature are measured. These measurements can be used to assign an RFRS value and to determine the likelihood of breast cancer relapse. Those breast cancer patients with tumors at high risk of relapse can be treated more aggressively whereas those at low risk of relapse can more safely avoid the risks and side effects of systemic chemotherapy. Thus, this method can provide rapid and useful information for clinical decision making.


Thus, in one embodiment, the invention relates to a method of evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer, the method comprising: providing a sample comprising breast tumor tissue from the patient; determining the levels of expression of the 17 genes, or one or more corresponding alternates thereof, identified in Table 1; or of the 8 genes, or one or more corresponding alternates thereof, identified in Table 2; in the sample; and correlating the levels of expression with the likelihood of a relapse. In some embodiments, the method further comprises detecting the level of expression of one or more reference genes, e.g., one or more reference genes selected from the genes identified in Table 3, or one or more reference genes selected from the genes ARPC2, ATF4, ATP5B, B2M, CDH4, CELF1, CLTA, CLTC, COPB1, CTBP1, CYC1, CYFIP1, DAZAP2, DHX15, DIMT1, EEF1A1, FLOT2, GADPH, GUSB, HADHA, HDLBP, HMBS, HNRNPC, HPRT1, HSP90AB1, MTCH1, MYL12B, NACA, NDUFB8, PGK1, PPIA, PPIB, PTBP1, RPL13A, RPLPO, RPS13, RPS23, RPS3, S100A6, SDHA, SEC31A, SET, SF3B1, SFRS3, SNRNP200, STARD7, SUMO1, TBP, TFRC, TMBIM6, TPT1, TRA2B, TUBA1C, UBB, UBC, UBE2D2, UBE2D3, VAMP3, XPO1, YTHDC1, YWHAZ, and 18S rRNA. In some embodiments, the step of determining the levels of expression of the gene comprises detecting the level of expression of RNA. In some embodiments, the determining step comprises detecting the level of expression of protein. The RNA may be detected using any known methods, e.g., a method comprising a quantitative PCR reaction. In some embodiments, detecting the level of expression of the RNA comprises hybridizing a nucleic acid obtained from the sample to an array that comprises probes to the 17 genes set forth in Table 1, and/or one or more corresponding alternates thereof; or hybridizing a nucleic acid obtained from the sample to an array that comprises probes to the 8 genes set forth in Table 2, and/or one or more corresponding alternates thereof.


In a further aspect, the invention provides a kit for detecting RNA expression comprising primers and/or probes for detecting the level of expression of the 17 genes set forth in Table 1, and/or one or more corresponding alternates thereof; or for detecting the level of expression of the 8 genes set forth in Table 2, and/or one or more alternates thereof. In some embodiments, the kit further comprises primers and/or probes for detecting the level of RNA expression of one or more reference genes, e.g., one or more reference genes selected from the genes identified in Table 3, or one or more reference genes selected from the genes ARPC2, ATF4, ATP5B, B2M, CDH4, CELF1, CLTA, CLTC, COPB1, CTBP1, CYC1, CYFIP1, DAZAP2, DHX15, DIMT1, EEF1A1, FLOT2, GADPH, GUSB, HADHA, HDLBP, HMBS, HNRNPC, HPRT1, HSP90AB1, MTCH1, MYL12B, NACA, NDUFB8, PGK1, PPIA, PPIB, PTBP1, RPL13A, RPLPO, RPS13, RPS23, RPS3, S100A6, SDHA, SEC31A, SET, SF3B1, SFRS3, SNRNP200, STARD7, SUMO1, TBP, TFRC, TMBIM6, TPT1, TRA2B, TUBA1C, UBB, UBC, UBE2D2, UBE2D3, VAMP3, XPO1, YTHDC1, YWHAZ, and 18S rRNA.


In a further aspect, the invention relates to a microarray comprising probes for detecting the level of expression of the 17 genes set forth in Table 1, and/or one or more corresponding alternates thereof; or for detecting the level of expression of the 8 genes set forth in Table 2, and/or one or more alternates thereof. In some embodiments, the microarray further comprises probes for detecting the level of expression of one or more reference genes, e.g., one or more reference genes selected from the genes identified in Table 3, or one or more reference genes selected from the genes ARPC2, ATF4, ATP5B, B2M, CDH4, CELF1, CLTA, CLTC, COPB1, CTBP1, CYC1, CYFIP1, DAZAP2, DHX15, DIMT1, EEF1A1, FLOT2, GADPH, GUSB, HADHA, HDLBP, HMBS, HNRNPC, HPRT1, HSP90AB1, MTCH1, MYL12B, NACA, NDUFB8, PGK1, PPIA, PPIB, PTBP1, RPL13A, RPLPO, RPS13, RPS23, RPS3, S100A6, SDHA, SEC31A, SET, SF3B1, SFRS3, SNRNP200, STARD7, SUMO1, TBP, TFRC, TMBIM6, TPT1, TRA2B, TUBA1C, UBB, UBC, UBE2D2, UBE2D3, VAMP3, XPO1, YTHDC1, YWHAZ, and 18S rRNA.


In an additional aspect, the invention relates to a computer-implemented method for evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer, the method comprising: receiving, at one or more computer systems, information describing the level of expression of the 17 genes set forth in Table 1, or one or more corresponding alternates thereof; or information describing the level of expression of the 8 genes set forth in Table 2, or one or more corresponding alternates thereof; in a breast tumor tissue sample obtained from the patient; performing, with one or more processors associated with the computer system, a random forest analysis in which the level of expression of each gene in the analysis is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”; generating, with the one or more processors associated with the one or more computer systems, a random forest relapse score (RFRS). In some embodiments in which the level of expression of the 17 genes, or at least one alternate, set forth in Table 1 is determined, if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to low risk group. In some embodiments in which the level of expression of the 8 genes, or at least one alternate, set forth in Table 2 is determined, if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to a low risk group.


In some embodiments, the computer-implemented method further comprises generating, with the one or more processors associated with the one or more computer systems, a likelihood of relapse by comparison of the RFRS score for the patient to a loess fit of RFRS versus likelihood of relapse for a training dataset.


In another aspect, the invention relates to a non-transitory computer-readable medium storing program code for evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer, the computer-readable medium comprising:


code for receiving information describing the level of expression of the 17 genes identified in Table 1, or one or more corresponding alternates thereof; or information describing the level of expression of the 8 genes identified in Table 2, or one or more corresponding alternates thereof; in a breast tumor tissue sample obtained from the patient;


code for performing a random forest analysis in which the level of expression of each gene in the analysis is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”; and


code for generating a random forest relapse score (RFRS). In some embodiments in which the level of expression of the 17 genes, or one or more designated alternates, identified in Table 1 is determined, if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to low risk group. In some embodiments in which the level of expression of the 8 genes, or one or more designated alternates, identified in Table 2, is determined, if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to a low risk group. In some embodiments, the non-transitory computer-readable medium storing program code further comprises code for generating a likelihood of relapse by comparison of the RFRS score for the patient to a loess fit of RFRS versus likelihood of relapse for a training dataset.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 shows an analysis of the studies employed in Example 1 to identify duplicates. The diagram shows the approximate overlap between GEO datasets used. Three studies show zero overlap while the other six show significant overlap.



FIG. 2 shows estrogen receptor and HER2 status for 998 samples employed in Example 1. Expression status was determined using the “205225_at” probe set for ER and the rank sum of the 216835_s_at (ERBB2), 210761_s_at (GRB7), 202991_at (STARD3) and 55616_at (PGAP3) probe sets for HER2. Threshold values were chosen by mixed model clustering. A total of 68 samples were determined to be ER-negative and 89 samples were determined to be HER2-positive. In total, 140 samples were either HER2-positive or ER-negative (17 were both) and were filtered from further analysis.



FIG. 3 illustrates the breakdown of samples for analysis. A total of 858 samples passed all filtering steps including 487 samples with 10-year follow-up data (213 relapse; 274 no relapse). The remaining 371 samples had insufficient follow-up for 10-year classification analysis but were retained for use in survival analysis. The 858 samples were broken into two-thirds training and one-third testing sets resulting in: a training set of 572 samples for use in survival analysis and 325 samples with 10yr follow-up (143 relapse; 182 no relapse) for classification analysis; and a testing set of 286 samples for use in survival analysis and 162 samples with 10-year follow-up (70 relapse; 92 no relapse) for classification analysis



FIG. 4 illustrates risk group threshold determination. The distribution of RFRS scores was determined for patients in the training dataset (N=325) comparing those with a known relapse (right side) versus those with no known relapse (left side). As expected, patients without a known relapse tend to have a higher predicted likelihood of relapse (by RFRS) and vice versa. Mixed model clustering was used to identify thresholds (0.333 and 0.606) for defining low, intermediate, and high-risk groups as indicated.



FIGS. 5A-C provide data illustrating likelihood of relapse according to RFRS group. The survival plot shows relapse-free survival comparing (from top to bottom) low-risk, intermediate-risk, and high-risk groups as determined by RFRS for: (A) the full-gene-set model on training data; (B) the 8-gene-set model on independent test data; (C) the 8-gene-set model on the independent NKI data set. Significance between risk groups was determined by Kaplan-Meier logrank test (with test for linear trend).



FIG. 6 illustrates likelihood of relapse according to RFRS group with breakdown into additional risk groups. The survival plot shows relapse-free survival comparing (from top to bottom) very-low-risk, low-risk, intermediate-risk, high-risk, and very-high-risk groups as determined by RFRS. Significance between risk groups was determined by Kaplan-Meier logrank test (with test for linear trend).



FIG. 7 illustrates estimated likelihood of relapse at 10 years for any RFRS value. The likelihood of relapse was calculated in the training data set (N=505) for 50 RFRS intervals (from 0 to 1). A smooth curve was fitted using a loess function and 95% confidence intervals plotted to represent the error in the fit. Short vertical marks just above the x-axis, one for each patient, represent the distribution of RFRS values observed in the training data. Thresholds for risk groups are indicated. The plot shows a linear relationship between RFRS and likelihood of relapse at 10 years with the likelihood ranging from approximately 0 to 40%.



FIG. 8 shows a gene ontology analysis of the genes identified for the 17-gene signature panel. A Gene Ontology (GO) analysis was performed using DAVID to identify the associated GO biological processes for the 17-gene model. The diagram represents the approximate overlap between GO terms. To simplify, redundant terms were grouped together. Genes in the 17-gene list are involved in a wide range of biological processes known to be involved in breast cancer biology including cell cycle, hormone response, cell death, DNA repair, transcription regulation, wound healing and others. Since the 8-gene set is entirely contained in the 17-gene set it would be involved in many of the same processes.



FIG. 9 provides a sample patient report of risk of relapse generated in accordance with the invention. Using the RFRS algorithm, a patient would be assigned an RFRS value. If RFRS is greater than or equal to 0.606 the patient is assigned to the “high-risk” group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to “intermediate-risk” group and if less than 0.333 the patient is assigned to “low-risk” group. The patient's RFRS value is also used to determine a likelihood of relapse by comparison to a pre-calculated loess fit of RFRS versus likelihood of relapse for the training dataset. The patient's estimated likelihood of relapse is determined, added to the summary plot, and output as a new report.



FIG. 10 (FIG. 10) is a flowchart of a method for identifying LN ER+HER2breast cancer patients that are candidates for additional treatment in one embodiment.



FIG. 11 (FIG. 11) is a flowchart of a method for generating an RF model for identifying LNER+HER2breast cancer patients that are candidates for additional treatment in one embodiment.



FIG. 12 (FIG. 12) is a block diagram of computer system 1200 that may incorporate an embodiment, be incorporated into an embodiment, or be used to practice any of the innovations, embodiments, and/or examples found within this disclosure.



FIGS. 13A and B illustrate likelihood of relapse according to RFRS group stratified by treatment status. The survival plot shows relapse-free survival comparing (from top to bottom) low-risk, intermediate-risk, and high-risk groups as determined by RFRS for: (A) hormone-therapy-treated and (B) untreated. Significance between risk groups was determined by Kaplan-Meier logrank test (with test for linear trend).





DETAILED DESCRIPTION OF THE INVENTION

An “estrogen receptor positive, lymph node-negative, HER2-negative” or “ER+NHER2” patient as used herein refers to a patient that has no discernible breast cancer in the lymph nodes; and has breast tumor cells that express estrogen receptor and do not show evidence of HER2 genomic (DNA) amplification or HER2 over-expression. LN− status is typically determined when the sentinel node is surgically removed and examined by microscopy for cytological evidence of disease. Patients are considered LN− (N0) if zero positive nodes were observed. Patients are considered LN+ if one or more lymph nodes were considered positive for disease (1-2 positive=N1; 3-6 positive=N2, etc). ER+ status is typically assessed by immunohistochemistry (IHC) where a positive determination is made when greater than a small percentage (typically greater than 3%, 5% or 10%) of cells stain positive. ER status can also be tested by quantitative PCR or biochemical assays. HER2status is generally determined by either IHC, fluorescence in situ hybridization (FISH) or some combination of the two methods. Typically, a patient is first tested by IHC and scored on a scale from 0 to 3 where a “3+” score indicates strong complete membrane staining on >5-10% of tumor cells and is considered positive. No staining (score of “0”) or a “1+” score, indicating faint partial membrane staining in greater than 5-10% of cells, is considered negative. An intermediate score of “2+”, indicating weak to moderate complete membrane staining in more than 5-10% of cells, may prompt further testing by FISH. A typical HER2 FISH scheme would consider a patient HER2+ if the ratio of a HER2 probe to a centromeric (reference) probe is more than 4:1 in ˜5% or more of cells after examining 20 or more metaphase spreads. Otherwise the patient is considered HER2. Quantitative PCR, array-based hybridization, and other methods may also be used to determine HER2 status. The specific methods and cutoff points for determining LN, ER and HER2 status may vary from hospital to hospital. For the purpose of this invention, a patient will be considered “ER+LNHER2” if reported as such by their health care provider or if determined by any accepted and approved methods, including but not limited to those detailed above.


In the current invention, a “gene set forth in” a table or a “gene identified in” a table are used interchangeably to refer to the gene that is listed in that table. For example, a gene “identified in” Table 4 refers to the gene that corresponds to the gene listed in Table 4. As understood in the art, there are naturally occurring polymorphisms for many gene sequences. Genes that are naturally occurring allelic variations for the purposes of this invention are those genes encoded by the same genetic locus. The proteins encoded by allelic variations of a gene set forth in Table 4 (or in any of Tables 1-3 or Table 4) typically have at least 95% amino acid sequence identity to one another, i.e., an allelic variant of a gene indicated in Table 4 typically encodes a protein product that has at least 95% identity, often at least 96%, at least 97%, at least 98%, or at least 99%, or greater, identity to the amino acid sequence encoded by the nucleotide sequence denoted by the Entrez Gene ID number (Apr. 1, 2012) shown in Table 4 for that gene. For example, an allelic variant of a gene encoding CCNB2 (gene: cyclin B2) typically has at least 95% identity, often at least 96%, at least 97%, at least 98%, or at least 99%, or greater, to the CCNB2 protein sequence encoded by the nucleic acid sequence available under Entrez Gene ID no. 9133). A “gene identified in” a table, such as Table 4, also refers to a gene that can be unambiguously mapped to the same genetic locus as that of a gene assigned to a genetic locus using the probes for the gene that are listed in Appendix 3. Similarly, a “gene identified in Table 1” or a “gene identified in Table 2” refers to a gene that can be unambiguously mapped to a genetic locus using the probes for that gene that are listed in Appendix 4 (panel of 17 genes from Table 1, which includes the genes for the 8 gene panel identified in Table 2); and a “gene identified in Table 3” refers to a gene that can be unambiguously mapped to a genetic locus using the probes for that gene that are listed in Appendix 5.


The terms “identical” or “100% identity,” in the context of two or more nucleic acids or proteins refer to two or more sequences or subsequences that are the same sequences. Two sequences are “substantially identical” or a certain percent identity if two sequences have a specified percentage of amino acid residues or nucleotides that are the same (i.e., 70% identity, optionally 75%, 80%, 85%, 90%, or 95% identity, over a specified region, or, when not specified, over the entire sequence), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using known sequence comparison algorithms, e.g., BLAST using the default parameters, or by manual alignment and visual inspection.


A “gene product” or “gene expression product” in the context of this invention refers to an RNA or protein encoded by the gene.


The term “evaluating a biomarker” in an LNER+HER2patient refers to determining the level of expression of a gene product encoded by a gene, or allelic variant of the gene, listed in Table 4. Preferably, the gene is listed in Table 1 or Table 2 as either a primary or alternate gene. Typically, the RNA expression level is determined.


INTRODUCTION

The invention is based, in part, on the identification of a panel of at least eight genes whose gene expression level correlates with breast cancer prognosis. In some embodiments, the panel of at least eight genes comprises at least eight genes, or at least 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, or 50, or more genes, identified in Table 4 with the proviso that the gene is one of those also listed in Table 5. In some embodiments, the panel of genes comprises at least 8 primary genes, or at least 9, 10, 11, 12, 13, 14, 15, 16, or all 17 primary genes identified in Table 1; or the 8 primary genes set forth in Table 2. Table 1 also shows alternate genes for each of the seventeen that can replace the specific primary gene in the analysis. At least one alternate gene can be evaluated in place of the corresponding primary gene listed in Table 1, or can be evaluated in addition to the corresponding primary gene listed in Table 1. Similarly, Table 2 shows alternate genes for each of the eight that can replace, or be assayed in addition to, the specific primary gene in the analysis. The results of the expression analysis are then evaluated using an algorithm to determine breast cancer patients that are likely to have a recurrence, and accordingly, are candidates for treatment with more aggressive therapy, such as chemotherapy.


The invention therefore relates to measurement of expression levels of a biomarker panel, e.g., a 17-gene expression panel, or an 8-gene expression panel, in a breast cancer patient prior to the patient undergoing chemotherapy. In some embodiments, probes to detect such transcripts may be applied in the form of a diagnostic device to predict which LNER+HER2breast cancer patients have a greater risk for relapse.


Typically, the methods of the invention comprise determining the expression levels of all seventeen primary genes, and/or at least one corresponding alternate gene shown in Table 1. However, in some embodiments, the expression level of fewer genes, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 genes, may be evaluated. In some embodiments, the methods of the invention comprise determining the expression level of all eight gene and/or at least one corresponding alternate gene shown in Table 2. Gene expression levels may be measured using any number of methods known in the art. In typical embodiments, the method involves measuring the level of RNA. RNA expression can be quantified using any method, e.g., employing a quantitative amplification method such as qPCR. In other embodiments, the methods employ array-based assays. In still other embodiments, protein products may be detected. The gene expression patterns are determined using a sample obtained from breast tumor.


In the context of this invention, an “alternate gene” refers to a gene that can be evaluated for expression levels instead of, or in addition to, the gene for which the “alternate gene” is the designated alternate in Table 1. For example, one of the genes in Table 1 is CCNB2. MELK and GINS1 are both alternatives that can be evaluated for expression instead of CCNB2 or in addition to CCNB2, when evaluating the gene expression levels of the 17 genes set forth in Table 1. With respect to Table 2, an “alternate gene” refers to a gene that can be evaluated for expression levels instead of, or in addition to, the gene for which the “alternate gene” is the designated alternate in Table 2. For example, one of the genes in Table 2 is CCNB2. MELK and TOP2A are both alternatives that can be evaluated for expression instead of CCNB2 or in addition to CCNB2 when evaluating the gene expression levels of the 8 genes set forth in Table 2.


Methods for Quantifying RNA

The quantity of RNA encoded by a gene set forth in Table 1 or Table 2 and optionally, a gene set forth in Table 3 or an alternative reference gene, can be readily determined according to any method known in the art for quantifying RNA. Various methods involving amplification reactions and/or reactions in which probes are linked to a solid support and used to quantify RNA may be used. Alternatively, the RNA may be linked to a solid support and quantified using a probe to the sequence of interest.


An “RNA nucleic acid sample” analyzed in the invention is obtained from a breast tumor sample obtained from the patient. An “RNA nucleic acid sample” comprises RNA, but need not be purely RNA, e.g., DNA may also be present in the sample. Techniques for obtaining an RNA sample from tumors are well known in the art.


In some embodiments, the target RNA is first reverse transcribed and the resulting cDNA is quantified. In some embodiments, RT-PCR or other quantitative amplification techniques are used to quantify the target RNA. Amplification of cDNA using PCR is well known (see U.S. Pat. Nos. 4,683,195 and 4,683,202; PCR PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS (Innis et al., eds, 1990)). Methods of quantitative amplification are disclosed in, e.g., U.S. Pat. Nos. 6,180,349; 6,033,854; and 5,972,602, as well as in, e.g., Gibson et al., Genome Research 6:995-1001 (1996); DeGraves, et al., Biotechniques 34(1):106-10, 112-5 (2003); Deiman B, et al., Mol Biotechnol. 20(2):163-79 (2002). Alternative methods for determining the level of a mRNA of interest in a sample may involve other nucleic acid amplification methods such as ligase chain reaction (Barany (1991) Proc. Natl. Acad. Sci. USA 88:189-193), self-sustained sequence replication (Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional amplification system (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86:1173-1177), Q-Beta Replicase (Lizardi et al. (1988) Bio/Technology 6:1197), rolling circle replication (U.S. Pat. No. 5,854,033) or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well known to those of skill in the art.


In general, quantitative amplification is based on the monitoring of the signal (e.g., fluorescence of a probe) representing copies of the template in cycles of an amplification (e.g., PCR) reaction. One method for detection of amplification products is the 5′-3′ exonuclease “hydrolysis” PCR assay (also referred to as the TaqMan™ assay) (U.S. Pat. Nos. 5,210,015 and 5,487,972; Holland et al., PNAS USA 88: 7276-7280 (1991); Lee et al., Nucleic Acids Res. 21: 3761-3766 (1993)). This assay detects the accumulation of a specific PCR product by hybridization and cleavage of a doubly labeled fluorogenic probe (the “TaqMan™” probe) during the amplification reaction. The fluorogenic probe consists of an oligonucleotide labeled with both a fluorescent reporter dye and a quencher dye. During PCR, this probe is cleaved by the 5′-exonuclease activity of DNA polymerase if, and only if, it hybridizes to the segment being amplified. Cleavage of the probe generates an increase in the fluorescence intensity of the reporter dye.


Another method of detecting amplification products that relies on the use of energy transfer is the “beacon probe” method described by Tyagi and Kramer, Nature Biotech. 14:303-309 (1996), which is also the subject of U.S. Pat. Nos. 5,119,801 and 5,312,728. This method employs oligonucleotide hybridization probes that can form hairpin structures. On one end of the hybridization probe (either the 5′ or 3′ end), there is a donor fluorophore, and on the other end, an acceptor moiety. In the case of the Tyagi and Kramer method, this acceptor moiety is a quencher, that is, the acceptor absorbs energy released by the donor, but then does not itself fluoresce. Thus, when the beacon is in the open conformation, the fluorescence of the donor fluorophore is detectable, whereas when the beacon is in hairpin (closed) conformation, the fluorescence of the donor fluorophore is quenched. When employed in PCR, the molecular beacon probe, which hybridizes to one of the strands of the PCR product, is in “open conformation,” and fluorescence is detected, while those that remain unhybridized will not fluoresce (Tyagi and Kramer, Nature Biotechnol. 14: 303-306 (1996)). As a result, the amount of fluorescence will increase as the amount of PCR product increases, and thus may be used as a measure of the progress of the PCR. Those of skill in the art will recognize that other methods of quantitative amplification are also available.


Various other techniques for performing quantitative amplification of nucleic acids are also known. For example, some methodologies employ one or more probe oligonucleotides that are structured such that a change in fluorescence is generated when the oligonucleotide(s) is hybridized to a target nucleic acid. For example, one such method involves a dual fluorophore approach that exploits fluorescence resonance energy transfer (FRET), e.g., LightCycler™ hybridization probes, where two oligo probes anneal to the amplicon. The oligonucleotides are designed to hybridize in a head-to-tail orientation with the fluorophores separated at a distance that is compatible with efficient energy transfer. Other examples of labeled oligonucleotides that are structured to emit a signal when bound to a nucleic acid or incorporated into an extension product include: Scorpions™ probes (e.g., Whitcombe et al., Nature Biotechnology 17:804-807, 1999, and U.S. Pat. No. 6,326,145), Sunrise™ (or Amplifluor™) probes (e.g., Nazarenko et al., Nuc. Acids Res. 25:2516-2521, 1997, and U.S. Pat. No. 6,117,635), and probes that form a secondary structure that results in reduced signal without a quencher and that emits increased signal when hybridized to a target (e.g., Lux Probes™).


In other embodiments, intercalating agents that produce a signal when intercalated in double stranded DNA may be used. Exemplary agents include SYBR GREEN™ and SYBR GOLD™. Since these agents are not template-specific, it is assumed that the signal is generated based on template-specific amplification. This can be confirmed by monitoring signal as a function of temperature because melting point of template sequences will generally be much higher than, for example, primer-dimers, etc.


In other embodiments, the mRNA is immobilized on a solid surface and contacted with a probe, e.g., in a dot blot or Northern format. In an alternative embodiment, the probe(s) are immobilized on a solid surface and the mRNA is contacted with the probe(s), for example, in a gene chip array. A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoding the biomarkers or other proteins of interest.


In some embodiments, microarrays, e.g., are employed. DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning Hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 6,033,860, and 6,344,316. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNA's in a sample.


Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e.g., U.S. Pat. No. 5,384,261. Although a planar array surface is often employed the array may be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. Arrays may be peptides or nucleic acids on beads, gels, polymeric surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992. Arrays may be packaged in such a manner as to allow for diagnostics or other manipulation of an all-inclusive device.


Primer and probes for use in amplifying and detecting the target sequence of interest can be selected using well-known techniques.


In some embodiments, the methods of the invention further comprise detecting level of expression of one or more reference genes that can be used as controls to determine expression levels. Such genes are typically expressed constitutively at a high level and can act as a reference for determining accurate gene expression level estimates. Examples of control genes are provided in Table 3 and the following list: ARPC2, ATF4, ATP5B, B2M, CDH4, CELF1, CLTA, CLTC, COPB1, CTBP1, CYC1, CYFIP1, DAZAP2, DHX15, DIMT1, EEF1A1, FLOT2, GADPH, GUSB, HADHA, HDLBP, HMBS, HNRNPC, HPRT1, HSP90AB1, MTCH1, MYL12B, NACA, NDUFB8, PGK1, PPIA, PPIB, PTBP1, RPL13A, RPLPO, RPS13, RPS23, RPS3, S100A6, SDHA, SEC31A, SET, SF3B1, SFRS3, SNRNP200, STARD7, SUMO1, TBP, TFRC, TMBIM6, TPT1, TRA2B, TUBA1C, UBB, UBC, UBE2D2, UBE2D3, VAMP3, XPO1, YTHDC1, YWHAZ, and 18S rRNA genes. Accordingly, a determination of RNA expression levels of the genes of interest, e.g., the gene expression levels of the panel of genes identified in Table 1, and/or an alternate; or the gene expression levels of the panel of genes identified in Table 2, and/or an alternate; may also comprise determining expression levels of one or more reference genes set forth in Table 3 or one or more reference genes selected from the genes ARPC2, ATF4, ATP5B, B2M, CDH4, CELF1, CLTA, CLTC, COPB1, CTBP1, CYC1, CYFIP1, DAZAP2, DHX15, DIMT1, EEF1A1, FLOT2, GADPH, GUSB, HADHA, HDLBP, HMBS, HNRNPC, HPRT1, HSP90AB1, MTCH1, MYL12B, NACA, NDUFB8, PGK1, PPIA, PPIB, PTBP1, RPL13A, RPLPO, RPS13, RPS23, RPS3, S100A6, SDHA, SEC31A, SET, SF3B1, SFRS3, SNRNP200, STARD7, SUMO1, TBP, TFRC, TMBIM6, TPT1, TRA2B, TUBA1C, UBB, UBC, UBE2D2, UBE2D3, VAMP3, XPO1, YTHDC1, YWHAZ, and 18S rRNA.


In the context of this invention, “determining the levels of expression” of an RNA of interest encompasses any method known in the art for quantifying an RNA of interest.


Detection of Protein Levels

In some embodiments, e.g., where the expression level of a protein encoded by a biomarker gene set forth in Table 1 or Table 2 is measured. Often, such measurements may be performed using immunoassays. Protein expression level is determined using a breast tumor sample obtained from the patient.


A general overview of the applicable technology can be found in Harlow & Lane, Antibodies: A Laboratory Manual (1988) and Harlow & Lane, Using Antibodies (1999). Methods of producing polyclonal and monoclonal antibodies that react specifically with an allelic variant are known to those of skill in the art (see, e.g., Coligan, Current Protocols in Immunology (1991); Harlow & Lane, supra; Goding, Monoclonal Antibodies: Principles and Practice (2d ed. 1986); and Kohler & Milstein, Nature 256:495-497 (1975)). Such techniques include antibody preparation by selection of antibodies from libraries of recombinant antibodies in phage or similar vectors, as well as preparation of polyclonal and monoclonal antibodies by immunizing rabbits or mice (see, e.g., Huse et al., Science 246:1275-1281 (1989); Ward et al., Nature 341:544-546 (1989)).


Polymorphic alleles can be detected by a variety of immunoassay methods. For a review of immunological and immunoassay procedures, see Basic and Clinical Immunology (Stites & Terr eds., 7th ed. 1991). Moreover, the immunoassays can be performed in any of several configurations, which are reviewed extensively in Enzyme Immunoassay (Maggio, ed., 1980); and Harlow & Lane, supra. For a review of the general immunoassays, see also Methods in Cell Biology: Antibodies in Cell Biology, volume 37 (Asai, ed. 1993); Basic and Clinical Immunology (Stites & Ten, eds., 7th ed. 1991).


Commonly used assays include noncompetitive assays, e.g., sandwich assays, and competitive assays. Typically, an assay such as an ELISA assay can be used. The amount of the polypeptide variant can be determined by performing quantitative analyses.


Other detection techniques, e.g., MALDI, may be used to directly detect the presence of proteins correlated with treatment outcomes.


As indicated above, evaluation of protein expression levels may additionally include determining the levels of protein expression of control genes, e.g., of one or more genes identified in Table 3.


Devices and Kits

In a further aspect, the invention provides diagnostic devices and kits for identifying gene expression products of a panel of genes that is associated with prognosis for a LNER+HER2breast cancer patient.


In some embodiments, a diagnostic device comprises probes to detect at least 8, 9, 10, 11, 12, 13, 14, 15, 16, or all 17 gene expression products set forth in Table 1, and/or alternates. In some embodiments, a diagnostic device comprises probes to detect the expression products of the 8 genes set forth in Table 2, and/or alternates. In some embodiments, the present invention provides oligonucleotide probes attached to a solid support, such as an array slide or chip, e.g., as described in DNA Microarrays: A Molecular Cloning Manual, 2003, Eds. Bowtell and Sambrook, Cold Spring Harbor Laboratory Press. Construction of such devices are well known in the art, for example as described in US Patents and Patent Publications U.S. Pat. No. 5,837,832; PCT application WO95/11995; U.S. Pat. No. 5,807,522; U.S. Pat. Nos. 7,157,229, 7,083,975, 6,444,175, 6,375,903, 6,315,958, 6,295,153, and 5,143,854, 2007/0037274, 2007/0140906, 2004/0126757, 2004/0110212, 2004/0110211, 2003/0143550, 2003/0003032, and 2002/0041420. Nucleic acid arrays are also reviewed in the following references: Biotechnol Annu Rev 8:85-101 (2002); Sosnowski et al, Psychiatr Genet 12(4):181-92 (December 2002); Heller, Annu Rev Biomed Eng 4: 129-53 (2002); Kolchinsky et al, Hum. Mutat 19(4):343-60 (April 2002); and McGail et al, Adv Biochem Eng Biotechnol 77:21-42 (2002).


An array can be composed of a large number of unique, single-stranded polynucleotides, usually either synthetic antisense polynucleotides or fragments of cDNAs, fixed to a solid support. Typical polynucleotides are preferably about 6-60 nucleotides in length, more preferably about 15-30 nucleotides in length, and most preferably about 18-25 nucleotides in length. For certain types of arrays or other detection kits/systems, it may be preferable to use oligonucleotides that are only about 7-20 nucleotides in length. In other types of arrays, such as arrays used in conjunction with chemiluminescent detection technology, preferred probe lengths can be, for example, about 15-80 nucleotides in length, preferably about 50-70 nucleotides in length, more preferably about 55-65 nucleotides in length, and most preferably about 60 nucleotides in length.


A person skilled in the art will recognize that, based on the known sequence information, detection reagents can be developed and used to assay any gene expression product set forth in Table 1 or Table 2 (or in some embodiments Table 3 or another reference gene described herein) and that such detection reagents can be incorporated into a kit. The term “kit” as used herein in the context of detection reagents, are intended to refer to such things as combinations of multiple gene expression detection reagents, or one or more gene expression detection reagents in combination with one or more other types of elements or components (e.g., other types of biochemical reagents, containers, packages such as packaging intended for commercial sale, substrates to which gene expression detection reagents are attached, electronic hardware components, etc.). Accordingly, the present invention further provides gene expression detection kits and systems, including but not limited to, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays of nucleic acid molecules where the arrays/microarrays comprise probes to detect the level of RNA transcript, and beads that contain one or more probes, primers, or other detection reagents for detecting one or more RNA transcripts encoded by a gene in a gene expression panel of the present invention. The kits can optionally include various electronic hardware components; for example, arrays (“DNA chips”) and microfluidic systems (“lab-on-a-chip” systems) provided by various manufacturers typically comprise hardware components. Other kits (e.g., probe/primer sets) may not include electronic hardware components, but may be comprised of, for example, one or more biomarker detection reagents (along with, optionally, other biochemical reagents) packaged in one or more containers.


In some embodiments, a detection kit typically contains one or more detection reagents and other components (e.g. a buffer, enzymes such as DNA polymerases) necessary to carry out an assay or reaction, such as amplification for detecting the level of transcript. A kit may further contain means for determining the amount of a target nucleic acid, and means for comparing the amount with a standard, and can comprise instructions for using the kit to detect the nucleic acid molecule of interest. In one embodiment of the present invention, kits are provided which contain the necessary reagents to carry out one or more assays to detect one or more RNA transcripts of a gene disclosed herein. In one embodiment of the present invention, biomarker detection kits/systems are in the form of nucleic acid arrays, or compartmentalized kits, including microfluidic/lab-on-a-chip systems.


Detection kits/systems for detecting expression of a panel of genes in accordance with the invention may contain, for example, one or more probes, or pairs or sets of probes, that hybridize to a nucleic acid molecule encoded by a gene set forth in Table 1 or Table 2. In some embodiments, the presence of more than one biomarker can be simultaneously evaluated in an assay. For example, in some embodiments probes or probe sets to different biomarkers are immobilized as arrays or on beads. For example, the same substrate can comprise probes for detecting expression of at least 8, 9, 10, 11, 12, 13, 14, 15, 16, or 17 or more of the genes set forth in Table 1, and/or alternates to the genes. In some embodiments, the same substrate can comprise probes for detecting expression of 8 or more genes set forth in Table 2, and/or alternates to the genes.


Using such arrays or other kits/systems, the present invention provides methods of identifying the levels of expression of a gene described herein in a test sample. Such methods typically involve incubating a test sample of nucleic acids obtained from a breast tumor from a LNER+HER2patient with an array comprising one or more probes that selectively hybridizes to a nucleic acid encoded by a gene identified in Table 1 or Table 2. Such an array may additionally comprise probes to one or more reference genes identified in Table 3, or one or more reference genes selected from the genes ARPC2, ATF4, ATP5B, B2M, CDH4, CELF1, CLTA, CLTC, COPB1, CTBP1, CYC1, CYFIP1, DAZAP2, DHX15, DIMT1, EEF1A1, FLOT2, GADPH, GUSB, HADHA, HDLBP, HMBS, HNRNPC, HPRT1, HSP90AB1, MTCH1, MYL12B, NACA, NDUFB8, PGK1, PPIA, PPIB, PTBP1, RPL13A, RPLPO, RPS13, RPS23, RPS3, S100A6, SDHA, SEC31A, SET, SF3B1, SFRS3, SNRNP200, STARD7, SUMO1, TBP, TFRC, TMBIM6, TPT1, TRA2B, TUBA1C, UBB, UBC, UBE2D2, UBE2D3, VAMP3, XPO1, YTHDC1, YWHAZ, and 18S rRNA. In some embodiments, the array comprises probes to all 17 genes identified in Table 1, and/or alternates; or all 8 genes identified in Table 2, and/or alternates. Conditions for incubating a gene detection reagent (or a kit/system that employs one or more such biomarker detection reagents) with a test sample vary. Incubation conditions depend on such factors as the format employed in the assay, the detection methods employed, and the type and nature of the detection reagents used in the assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification and array assay formats can readily be adapted to detect a gene set forth in Table 1 or Table 2.


A gene expression detection kit of the present invention may include components that are used to prepare nucleic acids from a test sample for the subsequent amplification and/or detection of a gene transcript.


In some embodiments, a gene expression kit comprises one or more reagents, e.g., antibodies, for detecting protein products of a gene identified in Table 1 or Table 2 and optionally Table 3.


Correlating Gene Expression Levels with Prognostic Outcomes


The present invention provides methods of determining the levels of a gene expression product to evaluate the likelihood that a LN-ER+HER2−breast cancer patient will have a relapse. Accordingly, the method provides a way of identifying LNER+HER2breast cancer patients that are candidates for additional treatment, e.g., chemotherapy.



FIG. 10 is a flowchart of a method for identifying LNER+HER2breast cancer patients that are candidates for additional treatment in one embodiment. Implementations of or processing in method 1000 depicted in FIG. 10 may be performed by software (e.g., instructions or code modules) when executed by a central processing unit (CPU or processor) of a logic machine, such as a computer system or information processing device, by hardware components of an electronic device or application-specific integrated circuits, or by combinations of software and hardware elements. Method 1000 depicted in FIG. 10 begins in step 1010.


In step 1020, information is received describing one or more levels of expression of one or more predetermined genes in a sample obtained from a subject. For example, the level of a gene expression product associated with a prognostic outcome for a LNER+HER2breast cancer patient may be recorded. In one embodiment, input data includes a text file (e.g., a tab-delimited text file) of normalized expression values for 17 transcripts from primary genes (or an indicated alternative) from Table 1. In one embodiment, input data includes a text file (e.g., a tab-delimited text file) of normalized expression values for 8 transcripts from the primary genes (or an indicated alternative) from Table 2. For example, the text file may have the gene expression values for the 17 transcripts/genes as columns and patient(s) as rows. An illustrative patient data file (patient_data.txt) is presented in Appendix 1.


In step 1030, a random forest analysis is performed on the information describing the one or more levels of expression of the one or more predetermined genes in the sample obtained from the subject. A Random Forest (RF) algorithm is used to determine a Relapse Score (RS) when applied to independent patient data. A sample R program for running the RF algorithm is presented in Appendix 2. A Random Forest Relapse Score (RFRS) algorithm as used herein typically consists of a predetermined number of decision trees suitably adapted to ensure at least a fully deterministic model. Each node (branch) in each tree represents a binary decision based on transcript levels for transcripts described herein. Based on these decisions, the subject is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”. The fraction of votes for “relapse” to votes for “no relapse” represents the RFRS—a measure of the probability of relapse. In some embodiments, if a subject's RFRS is greater than or equal to 0.606, the subject is assigned to one or more “high risk” groups. If an RFRS is greater than or equal to 0.333 and less than 0.606, the subject is assigned to one or more “intermediate risk” group. If an RFRS is less than 0.333, the subject is assigned to one or more “low risk” groups. In further embodiments, a subject's RFRS value is also used to determine a likelihood of relapse by comparison to a loess fit of RFRS versus likelihood of relapse for a training dataset. A subject's estimated likelihood of relapse is determined, added to a summary plot, and output as a new report.


In step 1040, information indicative of either “relapse” or no “relapse” is generated based on the random forest analysis. In some embodiments, information indicative of either “relapse” or no “relapse” is generated to include one or more summary statistics. For example, information indicative of either “relapse” or no “relapse” may be representative of how assignments to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”, are made. In further embodiment, information indicative of either “relapse” or no “relapse” is generated for the fraction of votes for “relapse” to votes for “no relapse” as discussed above to represent the RFRS.


In step 1050, information indicative of one or more additional therapies is generated based on indicative of “relapse”. For example, if an RFRS is greater than or equal to 0.606, the subject is assigned to a “high risk” group from which the one or more additional therapies may be selected. If an RFRS score is greater than or equal to 0.333 and less than 0.606, the subject is assigned to an “intermediate risk” group from which all or none of the one or more additional therapies may be selected. If an RFRS is less than 0.333, the subject is assigned to a “low risk” group. In various embodiments, a subject's RFRS value is also used to determine a likelihood of relapse by comparison to a loess fit of RFRS versus likelihood of relapse for a training dataset described in FIG. 11 and in the Examples section. FIG. 10 ends in step 1060.



FIG. 11 is a flowchart of a method for generating an RF model for identifying LNER+HER2breast cancer patients that are candidates for additional treatment in one embodiment. Implementations of or processing in method 1100 depicted in FIG. 11 may be performed by software (e.g., instructions or code modules) when executed by a central processing unit (CPU or processor) of a logic machine, such as a computer system or information processing device, by hardware components of an electronic device or application-specific integrated circuits, or by combinations of software and hardware elements. Method 1100 depicted in FIG. 11 begins in step 1110.


In step 1120, training data is received. For example, training data was generated as discussed below in the Examples section. In step 1130, variables on which to base decisions at tree nodes and classifier data are received. In one embodiment, classification was performed on training samples with either a relapse or no relapse after 10yr follow-up. In one example, a binary classification (e.g., relapse versus no relapse) is specified. However, additional classifier data may be included, such as a probability (proportion of “votes”) for relapse which is termed the Random Forests Relapse Score (RFRS). Risk group thresholds can be determined from the distribution of relapse probabilities using mixed model clustering to set cutoffs for low, intermediate and high risk groups.


In step 1140, a random forest model is generated. For example, a random forest model may be generated with at least 100,001 trees (i.e., using an odd number to ensure a substantially fully deterministic model). FIG. 11 ends in step 1150.


Hardware Description

The invention thus includes a computer system to implement the algorithm. Such a computer system can comprise code for interpreting the results of an expression analysis evaluating the level of expression of the 17 genes, or a designated alternate gene) identified in Table 1; or code for interpreting the results of an expression analysis evaluating the level of expression of the 8 genes, or a designated alternate gene, identified in Table 2. Thus in an exemplary embodiment, the expression analysis results are provided to a computer where a central processor executes a computer program for determining the propensity for relapse for a LNER+HER2breast cancer patient.


The invention also provides the use of a computer system, such as that described above, which comprises: (1) a computer; (2) a stored bit pattern encoding the expression results obtained by the methods of the invention, which may be stored in the computer; and, optionally, (3) a program for determining the likelihood of relapse.


The invention further provides methods of generating a report based on the detection of gene expression products for a LNER+HER2breast cancer patient. Such a report is based on the detection of gene expression products encoded by the 17 genes, or one of the designated alternates, set forth in Table 1; or detection of gene expression products encoded by the 8 genes, or one of the designated alternates, set forth in Table 2.



FIG. 12 is a block diagram of a computer system 1200 that may incorporate an embodiment, be incorporated into an embodiment, or be used to practice any of the innovations, embodiments, and/or examples found within this disclosure. FIG. 12 is merely illustrative of a computing device, general-purpose computer system programmed according to one or more disclosed techniques, or specific information processing device for an embodiment incorporating an invention whose teachings may be presented herein and does not limit the scope of the invention as recited in the claims. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.


Computer system 1200 can include hardware and/or software elements configured for performing logic operations and calculations, input/output operations, machine communications, or the like. Computer system 1200 may include familiar computer components, such as one or more one or more data processors or central processing units (CPUs) 1205, one or more graphics processors or graphical processing units (GPUs) 1210, memory subsystem 1215, storage subsystem 1220, one or more input/output (I/O) interfaces 1225, communications interface 1230, or the like. Computer system 1200 can include system bus 1235 interconnecting the above components and providing functionality, such connectivity and inter-device communication. Computer system 1200 may be embodied as a computing device, such as a personal computer (PC), a workstation, a mini-computer, a mainframe, a cluster or farm of computing devices, a laptop, a notebook, a netbook, a PDA, a smartphone, a consumer electronic device, a gaming console, or the like.


The one or more data processors or central processing units (CPUs) 1205 can include hardware and/or software elements configured for executing logic or program code or for providing application-specific functionality. Some examples of CPU(s) 1205 can include one or more microprocessors (e.g., single core and multi-core) or micro-controllers. CPUs 1205 may include 4-bit, 8-bit, 12-bit, 16-bit, 32-bit, 64-bit, or the like architectures with similar or divergent internal and external instruction and data designs. CPUs 1205 may further include a single core or multiple cores. Commercially available processors may include those provided by Intel of Santa Clara, Calif. (e.g., x86, x8664, PENTIUM, CELERON, CORE, CORE 2, CORE ix, ITANIUM, XEON, etc.) or by Advanced Micro Devices of Sunnyvale, Calif. (e.g., x86, AMC64, ATHLON, DURON, TURION, ATHLON XP/64, OPTERON, PHENOM, etc). Commercially available processors may further include those conforming to the Advanced RISC Machine (ARM) architecture (e.g., ARMv7-9), POWER and POWERPC architecture, CELL architecture, and or the like. CPU(s) 1205 may also include one or more field-gate programmable arrays (FPGAs), application-specific integrated circuits (ASICs), or other microcontrollers. The one or more data processors or central processing units (CPUs) 1205 may include any number of registers, logic units, arithmetic units, caches, memory interfaces, or the like. The one or more data processors or central processing units (CPUs) 1205 may further be integrated, irremovably or moveably, into one or more motherboards or daughter boards.


The one or more graphics processor or graphical processing units (GPUs) 1210 can include hardware and/or software elements configured for executing logic or program code associated with graphics or for providing graphics-specific functionality. GPUs 1210 may include any conventional graphics processing unit, such as those provided by conventional video cards. Some examples of GPUs are commercially available from NVIDIA, ATI, and other vendors. In various embodiments, GPUs 1210 may include one or more vector or parallel processing units. These GPUs may be user programmable, and include hardware elements for encoding/decoding specific types of data (e.g., video data) or for accelerating 2D or 3D drawing operations, texturing operations, shading operations, or the like. The one or more graphics processors or graphical processing units (GPUs) 1210 may include any number of registers, logic units, arithmetic units, caches, memory interfaces, or the like. The one or more data processors or central processing units (CPUs) 1205 may further be integrated, irremovably or moveably, into one or more motherboards or daughter boards that include dedicated video memories, frame buffers, or the like.


Memory subsystem 1215 can include hardware and/or software elements configured for storing information. Memory subsystem 1215 may store information using machine-readable articles, information storage devices, or computer-readable storage media. Some examples of these articles used by memory subsystem 1270 can include random access memories (RAM), read-only-memories (ROMS), volatile memories, non-volatile memories, and other semiconductor memories. In various embodiments, memory subsystem 1215 can include data and program code 1240.


Storage subsystem 1220 can include hardware and/or software elements configured for storing information. Storage subsystem 1220 may store information using machine-readable articles, information storage devices, or computer-readable storage media. Storage subsystem 1220 may store information using storage media 1245. Some examples of storage media 1245 used by storage subsystem 1220 can include floppy disks, hard disks, optical storage media such as CD-ROMS, DVDs and bar codes, removable storage devices, networked storage devices, or the like. In some embodiments, all or part of breast cancer prognosis data and program code 1240 may be stored using storage subsystem 1220.


In various embodiments, computer system 1200 may include one or more hypervisors or operating systems, such as WINDOWS, WINDOWS NT, WINDOWS XP, VISTA, WINDOWS 7 or the like from Microsoft of Redmond, Wash., Mac OS or Mac OS X from Apple Inc. of Cupertino, Calif., SOLARIS from Sun Microsystems, LINUX, UNIX, and other UNIX-based or UNIX-like operating systems. Computer system 1200 may also include one or more applications configured to execute, perform, or otherwise implement techniques disclosed herein. These applications may be embodied as breast cancer prognosis data and program code 1240. Additionally, computer programs, executable computer code, human-readable source code, shader code, rendering engines, or the like, and data, such as image files, models including geometrical descriptions of objects, ordered geometric descriptions of objects, procedural descriptions of models, scene descriptor files, or the like, may be stored in memory subsystem 1215 and/or storage subsystem 1220.


The one or more input/output (I/O) interfaces 1225 can include hardware and/or software elements configured for performing I/O operations. One or more input devices 1250 and/or one or more output devices 1255 may be communicatively coupled to the one or more I/O interfaces 1225.


The one or more input devices 1250 can include hardware and/or software elements configured for receiving information from one or more sources for computer system 1200. Some examples of the one or more input devices 1250 may include a computer mouse, a trackball, a track pad, a joystick, a wireless remote, a drawing tablet, a voice command system, an eye tracking system, external storage systems, a monitor appropriately configured as a touch screen, a communications interface appropriately configured as a transceiver, or the like. In various embodiments, the one or more input devices 1250 may allow a user of computer system 1200 to interact with one or more non-graphical or graphical user interfaces to enter a comment, select objects, icons, text, user interface widgets, or other user interface elements that appear on a monitor/display device via a command, a click of a button, or the like.


The one or more output devices 1255 can include hardware and/or software elements configured for outputting information to one or more destinations for computer system 1200. Some examples of the one or more output devices 1255 can include a printer, a fax, a feedback device for a mouse or joystick, external storage systems, a monitor or other display device, a communications interface appropriately configured as a transceiver, or the like. The one or more output devices 1255 may allow a user of computer system 1200 to view objects, icons, text, user interface widgets, or other user interface elements.


A display device or monitor may be used with computer system 1200 and can include hardware and/or software elements configured for displaying information. Some examples include familiar display devices, such as a television monitor, a cathode ray tube (CRT), a liquid crystal display (LCD), or the like.


Communications interface 1230 can include hardware and/or software elements configured for performing communications operations, including sending and receiving data. Some examples of communications interface 1230 may include a network communications interface, an external bus interface, an Ethernet card, a modem (telephone, satellite, cable, ISDN), (asynchronous) digital subscriber line (DSL) unit, FireWire interface, USB interface, or the like. For example, communications interface 1230 may be coupled to communications network/external bus 1280, such as a computer network, to a FireWire bus, a USB hub, or the like. In other embodiments, communications interface 1230 may be physically integrated as hardware on a motherboard or daughter board of computer system 1200, may be implemented as a software program, or the like, or may be implemented as a combination thereof.


In various embodiments, computer system 1200 may include software that enables communications over a network, such as a local area network or the Internet, using one or more communications protocols, such as the HTTP, TCP/IP, RTP/RTSP protocols, or the like. In some embodiments, other communications software and/or transfer protocols may also be used, for example IPX, UDP or the like, for communicating with hosts over the network or with a device directly connected to computer system 1200.


As suggested, FIG. 12 is merely representative of a general-purpose computer system appropriately configured or specific data processing device capable of implementing or incorporating various embodiments of an invention presented within this disclosure. Many other hardware and/or software configurations may be apparent to the skilled artisan which are suitable for use in implementing an invention presented within this disclosure or with various embodiments of an invention presented within this disclosure. For example, a computer system or data processing device may include desktop, portable, rack-mounted, or tablet configurations. Additionally, a computer system or information processing device may include a series of networked computers or clusters/grids of parallel processing devices. In still other embodiments, a computer system or information processing device may perform techniques described above as implemented upon a chip or an auxiliary processing board.


Many hardware and/or software configurations of a computer system may be apparent to the skilled artisan, which are suitable for use in implementing a RFRS algorithm as described herein. For example, a computer system or data processing device may include desktop, portable, rack-mounted, or tablet configurations. Additionally, a computer system or information processing device may include a series of networked computers or clusters/grids of parallel processing devices. In still other embodiments, a computer system or information processing device may use techniques described above as implemented upon a chip or an auxiliary processing board.


Various embodiments of an algorithm as described herein can be implemented in the form of logic in software, firmware, hardware, or a combination thereof. The logic may be stored in or on a machine-accessible memory, a machine-readable article, a tangible computer-readable medium, a computer-readable storage medium, or other computer/machine-readable media as a set of instructions adapted to direct a central processing unit (CPU or processor) of a logic machine to perform a set of steps that may be disclosed in various embodiments of an invention presented within this disclosure. The logic may form part of a software program or computer program product as code modules become operational with a processor of a computer system or an information-processing device when executed to perform a method or process in various embodiments of an invention presented within this disclosure. Based on this disclosure and the teachings provided herein, a person of ordinary skill in the art will appreciate other ways, variations, modifications, alternatives, and/or methods for implementing in software, firmware, hardware, or combinations thereof any of the disclosed operations or functionalities of various embodiments of one or more of the presented inventions.


EXAMPLES

The experiments outlined in the initial examples that identified markers for prognosis stratified node-negative, ER-positive, HER2-negative breast cancer patients into those that are most or least likely to develop a recurrence within 10 years after surgery. A multi-gene transcription-level-based classifier of 10-year-relapse (disease recurrence within 10 years) was developed using a large database of existing, publicly available microarray datasets. The probability of relapse and relapse risk score group using the panel of gene expression markers of the invention can be used to assign systemic chemotherapy to only those patients most likely to benefit from it.


Methods:


Literature Search and Curation:


Studies were collected which provided gene expression data for ER+, LN−, HER2− patients with no systemic chemotherapy (hormonal-therapy allowed). Each study was required to have a sample size of at least 100, report LN status, and include time and events for either recurrence free survival (RFS) or distant metastasis free survival (DMFS). The latter were grouped together for survival analysis where all events represent either a local or distant relapse. If ER or HER2 status was not reported, it was determined by array, but preference was given to studies with clinical determination first. A minimum of 10 years follow up was required for training the classifier. However, patients with shorter follow-up were included in survival analyses. Patients with immediately postoperative events (time=0) were excluded. Nine studies1-9 meeting the above criteria were identified by searching Pubmed and the Gene Expression Omnibus (GEO) database10. To allow combination of the largest number of samples, only the common Affymetrix U133A gene expression platform was used. 2175 breast cancer samples were identified. After filtering for only those samples which were ER+, node-negative, and had not received systemic chemotherapy, 1403 samples remained. Duplicate analysis removed a further 405 samples due to the significant amount of redundancy between studies (FIG. 1). Filtering for ER+ and HER2−status using array determinations eliminated another 140 samples (FIG. 2). Some ER−samples were from the Schmidt et al. Cancer Res 68, 5405-5413 (2008)5 dataset (31/201) which did not provide clinical ER status and thus for that study we relied solely on arrays for determination of ER status. However, there were also a small number (37/760) from the remaining studies, which represent discrepancies between array status and clinical determination. In such cases, both the clinical and array-based determinations were required to be positive for inclusion in further analysis. A total of 858 samples passed all filtering steps including 487 samples with 10 year follow-up data (213 relapse; 274 no relapse). The remaining 371 samples had insufficient follow-up for 10-year classification analysis, but were retained for use in the survival analysis. None of the 858 samples were treated with systemic chemotherapy but 302 (35.2%) were treated with adjuvant hormonal therapy of which 95.4% were listed as tamoxifen. The 858 samples were broken into two-thirds training and one-third testing sets resulting in: (A) a training set of 572 samples for use in survival analysis and 325 samples with 10yr follow-up (143 relapse; 182 no relapse) for classification analysis; and (B) a testing set of 286 samples for use in survival analysis and 162 samples with 10 year follow-up (70 relapse; 92 no relapse) for classification analysis. Table 6 outlines the datasets used in the analysis and FIG. 3 illustrates the breakdown of samples for analysis.


Pre-Processing:


All data processing and analyses were completed with open source R/Bioconductor packages. Raw data (Cel files) were downloaded from GEO. Duplicate samples were identified and removed if they had the same database identifier (e.g., GSM accession), same sample/patient id, or showed a high correlation (r>0.99) compared to any other sample in the dataset. Raw data were normalized and summarized using, the ‘affy’ and ‘gcrma’ libraries. Probes were mapped to Entrez gene symbols using both standard and custom annotation files11. ER and HER2 expression status was determined using standard probes. For the Affymetrix U133A array we and others have found the probe “205225_at” to be most effective for determining ER status12. Similarly a rank sum of the best probes for ERBB2 (216835_s_at), GRB7 (210761_s_at), STARD3 (202991_at) and PGAP3 (55616_at) was used to determine HER2 amplicon status. Cutoff values for ER and HER2 status were chosen by mixed model clustering (‘mclust’ library). Unsupervised clustering was performed to assess the extent of batch effects. Once all pre-filtering was complete, data were randomly split into training (⅔) and test (⅓) data sets while balancing for study of origin and number of relapses with 10 year follow-up. The test data set was put aside, left untouched, and only used for final validation, once each for the full-gene, 17-gene and 8-gene classifiers. Probes sets were then filtered for a minimum of 20% samples with expression above background threshold (raw value>100) and coefficient of variation between 0.7 and 10. A total of 3048 probesets/genes passed this filtering and formed the basis for the ‘full-gene set’ model described below.


Classification:


Classification was performed on only training samples with either a relapse or no relapse after 10yr follow-up using the ‘randomForest’ library. Forests were created with at least 100,001 trees (odd number ensures fully deterministic model) and otherwise default settings. Performance was assessed by area under the curve (AUC) of a receiver operating characteristic (ROC) curve, calculated with the ‘ROCR’ package, from Random Forests internal out-of-bag (00B) testing results. By default, RF performs a binary classification (e.g., relapse versus no relapse). However it also reports a probability (proportion of “votes”) for relapse which we term Random Forests Relapse Score (RFRS). Risk group thresholds were determined from the distribution of relapse probabilities using mixed model clustering to set cutoffs for low, intermediate and high risk groups (FIG. 4).


Determination of Optimal 17-Gene and 8-Gene Sets:


Initially an optimal set of 20 genes was selected by removing redundant probe sets and extracting the top 100 genes (by reported Gini variable importance), k-means clustering (k=20) these genes and selecting the best gene from each cluster (again by variable importance). Additional genes in each cluster serve as robust alternates in case of failure to migrate primary genes to an assay platform. A gene might fail to migrate due to problems with prober/primer design or differences in the sensitivity of a specific assay for that gene. The top 100 genes/probesets were also manually checked for sequence correctness by alignment to the reference genome. Seven genes/probesets with ambiguous or erroneous alignments were marked for exclusion. Three genes/probesets were also excluded because of their status as hypothetical proteins (KIAA0101, KIAA0776, KIAA1467). After these removals, a set of 17 primary genes and 73 alternate genes remained. All but two primary genes have two or more alternates (TXNIP is without alternate, and APOC 1 has a single alternate). Table 1 lists the final gene set, their top two alternate genes (where available) and their variable importance values (See Table 4 for complete list). The above procedure was repeated to produce an optimal set of 8 genes, this time starting from the top 90 non-redundant probe-sets (excluding the 10 genes with problems identified above), k-means clustering (k=8) these genes and selecting the best gene from each cluster. All 8 genes were also included in the 17-gene set and have at least two alternates (Table 2, Table 5). Using the final optimized 17-gene and 8-gene sets as input, new RF models were built on training data.


Validation (testing and survival analysis): Survival analysis on all training data, now also including those patients with less than 10 years of follow-up, was performed with risk group as a factor, for the full-gene, 17-gene, and 8-gene models, using the ‘survival’ package. Note, the risk scores and groups for samples used in training were assigned from internal 00B cross-validation. Only those patients not used in initial training (without 10 year follow-up) were assigned a risk score and group by de novo classification. Significance between risk groups was determined by Kaplan-Meier logrank test (with test for linear trend). However, to directly compare relapse rates per risk group to that reported by Paik et al., N Engl J Med 351: 2817-2826 (2004)13, the overall relapse rates in our patient cohort were randomly down-sampled to the same rate (15%) as in their cohort13 and results averaged over 1000 iterations. To illustrate, the training data set includes 572 samples with 143 relapse events (I.e., 25.0% relapse rate). Samples with relapse events were randomly eliminated from the cohort until only 15% of remaining samples had relapse events (76/505=15%). This “down-sampled” dataset was then classified using the RFRS model to assign each sample to a risk group and the rates of relapse determined for each group. The entire down-sampling procedure was then repeated 1000 times to obtain average estimated rates of relapse for each risk group given the overall rate of relapse of 15%. Setting the overall relapse rate to 15% is also useful because this more closely mirrors the general population rate of relapse. Without this down-sampling, expected relapse rates in each risk group would appear unrealistically high. See FIG. 2 for explanation of the break-down of samples into training and test sets used for classifier building and survival analysis.


Next, the full-gene, 17-gene and 8-gene RF models along with risk group cutoffs were applied to the independent test data. The same performance metrics, survival analysis and estimates of 10 year relapse rates were performed as above. The 17-gene model was also tested on the independent test data, stratified by treatment (untreated vs hormone therapy treated), to evaluate whether performance of the signature was biased towards one patient subpopulation or the other. These independent test data were not used in any way during the training phase. However, these samples represent a random subset of the same patient populations that were used in training Therefore, they are not as fully independent as recommended by the Institute of Medicine (IOM) ‘committee on the review of omics-based tests for predicting patient outcomes in clinical trials’18. Therefore, an additional independent validation was performed against the NKI dataset19 obtained from the http address bioinformatics.nki.nl/data.php. These data represent a set of 295 consecutive patients with primary stage I or II breast carcinomas. The dataset was filtered down to the 89 patients who were node-negative, ER-positive, HER2-negative and not treated by systemic chemotherapy19. Relapse times and events were defined by any of distant metastasis, regional recurrence or local recurrence. Expression values from the NKI Agilent array data were re-scaled to the same distribution as that used in training using the ‘preprocessCore’ package. Values for the 8-gene and 17-gene-set RFRS models were extracted for further analysis. If more than one Agilent probe set could be mapped to an RFRS gene then the probe set with greatest variance was used. The full-gene-set model was not applied to NKI data because only 2530/3048 Affymetrix-defined genes (probe sets) in the full-gene-set could be mapped to Agilent genes (probe sets) in the NKI dataset. However, the 17-gene and 8-gene RFRS models were applied to NKI data to calculate predicted probabilities of relapse. Patients were divided into low, intermediate, and high risk groups by ranking according to probability of relapse and then dividing so that the proportions in each risk group were identical to that observed in training ROC AUC, survival p-values and estimated rates of relapse were then calculated as above. It should be noted that while the NKI clinical data described here (N=89) had an average follow-up time of 9.55 years (excluding relapse events), 34 patients had a follow-up time less than 10 years (range 1.78-9.83 years). These patients would not have met our criteria for inclusion in the training dataset and likely represent some events which have not occurred yet. If anything, this is likely to reduce the AUC estimate and underestimate p-value significance in survival analysis.


Selection of Control Genes:


While not necessary for Affymetrix, migration to other assay technologies (e.g., RT-PCR approaches) may employ highly expressed and invariant genes to act as a reference for determining accurate gene expression level estimates. To this end, we developed two sets of reference genes. The first was chosen by the following criteria: (1) filtered if not expressed above background threshold (raw value>100) in 99% of samples; (2) filtered if not in top 5th percentile (overall) for mean expression; (3) Filtered if not in top 10th percentile (remaining genes) for standard deviation; (4) ranked by coefficient of variation. The top 30 control genes from set #1 are listed in Table 3. Control genes underwent the same manual checks for sequence correctness by alignment to the reference genome as above and five genes were marked for exclusion. The second set of control genes were chosen to represent three ranges of mean expression levels encompassed by genes in the 17-gene signature (low: 0-400; medium: 500-900; high: 1200-1600). For each mean expression range, genes were (1) filtered if not expressed above background threshold (raw value>100) in 99% of samples; (2) ranked by coefficient of variation. The top 5 genes from each range in set #2 are listed in Table 3 along with previously reported reference genes (Paik et al., supra)13


Results:

Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704. This was comparable or better than reported by Johannes et al (2010) who tested a number of different classifiers on a smaller subset of the same data and found AUCs of 0.559 to 0.67114. It also compares favorably to the AUC value of 0.688 when the OncotypeDX algorithm was applied to this same training dataset. Mixed model clustering analysis identified three risk groups with probabilities for low risk<0.333; 0.333≦intermediate risk<0.606; and high risk≧0.606 (FIG. 4). Survival analysis determined a highly significant difference in relapse rate between risk groups (p=3.95E-11) (FIG. 5A). After down-sampling to a 15% overall rate of relapse, approximately 46.7% (n=235) of patients were placed in the low-risk group and were found to have a 10yr risk of relapse of only 8.0%. Similarly, 38.6% (n=195) and 14.9% (n=75) of patients were placed in the intermediate and high risk groups with rates of relapse of 17.6% and 30.3% respectively. These results are very similar to those for which Paik et al., supra reported as 51% of patients in the low-risk category with a rate of distant recurrence at 10 years of 6.8% (95% CI: 4.0-9.6); 22% in intermediate-risk category with recurrence rate of 14.3% (95% CI: 8.3-20.3); and 27% in high-risk category with recurrence rate of 30.5% (95% CI: 23.6-37.4)13. The linear relationship between risk group and rate of relapse continues if groups are broken down further. For example, if “very low-risk” and “very high-risk” groups are defined these have even lower (7.1%) and higher (32.8%) rates of relapse (FIG. 6). This observation is consistent with the idea that the random forests relapse score (RFRS) is a quantitative, linear measure directly related to probability of relapse. FIG. 7 shows the likelihood of relapse at 10 years, calculated for 50 RFRS intervals (from 0 to 1), with a smooth curve fitted, using a loess function and 95% confidence intervals representing error in the fit. The distribution of RFRS values observed in the training data is represented by short vertical marks just above the x axis, one for each patient.


Validation of the models against the independent test dataset also showed very similar results to training estimates. The full-gene-set model had an AUC of 0.730 and the 17-gene and 8-gene optimized models had minimal reduction in performance with AUC of 0.715 and 0.690 respectively. Again, this compared favorably to the AUC value of 0.712 when the OncotypeDX algorithm was applied to the same test dataset. Survival analysis again found very significant differences between the risk groups for the full-gene (p=6.54E-06), 17-gene (p=9.57E-06) and 8-gene (p=2.84E-05; FIG. 5B) models. For the 17-gene model, approximately 38.2% (n=97) of patients were placed in the low-risk group and were found to have a 10-year risk of relapse of only 7.8%. Similarly, 40.5% (n=103) and 21.3% (n=54) of patients were placed in the intermediate and high-risk groups with rates of relapse of 15.3% and 26.8% respectively. Very similar results were observed for the full-gene and 8-gene models (Table 7). Validation against the additional, independent, NKI dataset also had very similar results. The 17-gene and 8-gene models had AUC values of 0.688 and 0.699 respectively, nearly identical to the results for the previous independent dataset. Differences between risk groups in survival analysis were also significant for both 17-gene (p=0.023) and 8-gene (p=0.004, FIG. 5C) models.


The linear relationship between risk group and rate of relapse continues if groups are broken down further (using training data) into five equal groups instead of the three groups defined above (FIG. 6). This observation is consistent with the idea that the random forests relapse score (RFRS) is a quantitative, linear measure directly related to probability of relapse.



FIG. 7 shows the likelihood of relapse at 10 years, calculated for 50 RFRS intervals (from 0 to 1), with a smooth curve fitted, using a loess function and 95% confidence intervals representing error in the fit. The distribution of RFRS values observed in the training data is represented by short vertical marks just above the x axis, one for each patient.


In order to maximize the total size of our training dataset we allowed samples to be included from both untreated patients and those who received adjuvant hormonal therapy such as tamoxifen. Since outcomes likely differ between these two groups, and they may represent fundamentally different subpopulations, it is possible that performance of our predictive signatures is biased towards one group or the other. To assess this issue we performed validation against the independent test dataset, stratified by treatment status, using the 17-gene model. Both groups were found to have comparable AUC values with the slightly better value of 0.740 for hormone-treated versus 0.709 for untreated. Survival curves were also highly similar and significant with p-value of 0.004 and 3.76E-07 for treated and untreated respectively (FIGS. 13A and 13B). The difference in p-value appears more likely due to differences in the respective sample sizes than actual difference in survival curves.


The genes utilized in the RFRS model have only minimal overlap with those identified in other breast cancer outcome signatures. Specifically, the entire set of 100 genes (full-gene set before filtering) has only 6/65 genes in common with the gene expression panel proposed by van de Vijver, et al. N Engl J Med 347, 1999-2009 (2002)15, 2/21 with that proposed by Paik et al., supra, and 4/77 with that proposed by Wang et al. Lancet 365:671-679 (2005)20. The 17-gene and 8-gene optimized sets have only a single gene (AURKA) in common with the panel proposed by Paik et al., a single gene (FEN1) in common with Wang et al., and none with that of van de Vijver et al. A Gene Ontology analysis using DAVID16,17 revealed that genes in the 17-gene list are involved in a wide range of biological processes known to be involved in breast cancer biology including cell cycle, hormone response, cell death, DNA repair, transcription regulation, wound healing and others (FIG. 8). Since the 8-gene set is entirely contained in the 17-gene set it would be involved in many of the same processes.


While methods such as those proposed by Paik et al., and de Vijver, et al. (both supra)13,15 exist to predict outcome in breast cancer, the RFRS is advantageous in several respects: (1) The signature was built from the largest and purest training dataset available to date; (2) Patients with HER2+ tumors were excluded, thus focusing only on patients without an existing clear treatment course; (3) The gene signature predicts relapse with equal success for both patients that went on to receive adjuvant hormonal therapy and those who did not (4) The gene signature was designed for robustness with (in most cases) several alternate genes available for each primary gene; (5) probe set sequences have been manually validated by alignment and manual assessment. These features, particularly the latter two, make this signature an especially strong candidate for efficient migration to multiple low-cost platforms for use in a clinical setting. Development of a panel for use in the clinic could take advantage of not only primary genes but also some number of alternate genes to increase the chance of a successful migration. Given the small but significant number of discrepencies observed between clinical and array based determination of ER status we also recommend inclusion of standard biomarkers such as ER, PR and HER2 on any design. Finally, we provide a list of consistently expressed genes, specific to breast tumor tissue, for use as control genes for those platforms that require them.


Implementation of Algorithm Using 17-Gene Model as Example:

The RFRS algorithm is implemented in the R programming language and can be applied to independent patient data. Input data is a tab-delimited text file of normalized expression values with 17 transcripts/genes as columns and patient(s) as rows. A sample patient data file (patient_data.txt) is presented in Appendix 1. A sample R program (RFRS_sample_code.R) for running the algorithm is presented in Appendix 2. The RFRS algorithm consists of a Random Forest of 100,001 decision trees. This is pre-computed, provided as an R data object (RF_model17gene_optimized) based on the training set and is included in the working directory. Each node (branch) in each tree represents a binary decision based on transcript levels for transcripts described above. Based on these decisions, the patient is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”. The fraction of votes for “relapse” to votes for “no relapse” represents the RFRS—a measure of the probability of relapse. If RFRS is greater than or equal to 0.606 the patient is assigned to the “high risk” group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to “intermediate risk” group and if less than 0.333 the patient is assigned to “low risk” group. The patient's RFRS value is also used to determine a likelihood of relapse by comparison to a loess fit of RFRS versus likelihood of relapse for the training dataset. Pre-computed R data objects for the loess fit (RelapseProbabilityFit.Rdata) and summary plot (RelapseProbabilityPlot.Rdata) are loaded from file. The patient's estimated likelihood of relapse is determined, added to the summary plot, and output as a new report (see, FIG. 9, for example).


REFERENCES CITED IN EXAMPLES SECTION



  • 1 Desmedt, C. et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res 13, 3207-3214 (2007).

  • 2 Ivshina, A. V. et al. Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res 66, 10292-10301 (2006).

  • 3 Loi, S. et al. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J Clin Oncol 25, 1239-1246 (2007).

  • 4 Miller, L. D. et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA 102, 13550-13555 (2005).

  • 5 Schmidt, M. et al. The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68, 5405-5413 (2008).

  • 6 Sotiriou, C. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98, 262-272 (2006).

  • 7 Symmans, W. F. et al. Genomic index of sensitivity to endocrine therapy for breast cancer. J Clin Oncol 28, 4111-4119 (2010).

  • 8 Wang, Y. et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671-679 (2005).

  • 9 Zhang, Y. et al. The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy. Breast Cancer Res Treat 116, 303-309 (2009).

  • 10 Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39 (2011).

  • 11 Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 33, e175, (2005).

  • 12 Gong, Y. et al. Determination of oestrogen-receptor status and ERBB2 status of breast carcinoma: a gene-expression profiling study. Lancet Oncol 8, 203-211 (2007).

  • 13 Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351, 2817-2826 (2004).

  • 14 Johannes, M. et al. Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients. Bioinformatics 26, 2136-2144 (2010).

  • van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999-2009 (2002).

  • 16 Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4, 44-57 (2009).

  • 17 Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37, 1-13 (2009).

  • 18. Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Board on Health Care Services, Board on Health Sciences Policy, Institute of Medicine. Evolution of Translational Omics: Lessons Learned and the Path Forward. Christine M M, Sharly J N, Gilbert S O, editors: The National Academies Press; 2012.

  • 19. van de Vijver M J, He Y D, van't Veer L J, Dai H, Hart A A, Voskuil D W, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 2002;347:1999-2009.

  • 20. Wang, Y. et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671-679 (2005).



All publications, patents, accession numbers, and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.


Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.









TABLE 1







17-gene RFRS signature









Primary Predictor
Alternate 1
Alternate 2















CCNB2
0.785
MELK
0.739
GINS1
0.476


TOP2A
0.590
MCM2
0.428
CDK1
0.379


RACGAP1
0.588
LSM1
0.139
SCD
0.125


CKS2
0.515
NUSAP1
0.491
ZWINT
0.272


AURKA
0.508
PRC1
0.499
CENPF
0.306


FEN1
0.403
FADD
0.313
SMC4
0.170


EBP
0.341
RFC4
0.264
NCAPG
0.234


TXNIP
0.292
N/A
N/A
N/A
N/A


SYNE2
0.270
SCARB2
0.225
PDLIM5
0.167


DICER1
0.209
CALD1
0.129
SOX9
0.125


AP1AR
0.201
PBX2
0.134
WASL
0.126


NUP107
0.197
FAM38A
0.165
PLIN2
0.110


APOC1
0.176
APOE
0.121
N/A
N/A


DTX4
0.164
AQP1
0.141
LMO4
0.120


FMOD
0.154
RGS5
0.120
PIK3R1
0.103


MAPKAPK2
0.151
MTUS1
0.136
DHX9
0.136


SUPT4H1
0.111
PHB
0.106
CD44
0.105
















TABLE 2







8-gene RFRS signature









Primary Predictor
Alternate 1
Alternate 2















CCNB2
0.785
MELK
0.739
TOP2A
0.590


RACGAP1
0.588
TXNIP
0.292
APOC1
0.176


CKS2
0.515
NUSAP1
0.491
FEN1
0.403


AURKA
0.508
PRC1
0.499
CENPF
0.306


EBP
0.341
FADD
0.313
RFC4
0.264


SYNE2
0.270
SCARB2
0.225
PDLIM5
0.167


DICER1
0.209
FAM38A
0.165
FMOD
0.154


AP1AR
0.201
MAPKAPK2
0.151
MTUS1
0.136






















TABLE 3





Probe set
Gene Symbol
Mean (exp)
S.D.
Fraction (exp)
COV
CDF















Top 25 RFRS Reference Genes













103910_at
MYL12B
1017.5
195.8
1.00
0.192
custom


208672_s_at
SFRS3
1713.0
380.0
1.00
0.222
standard


200960_x_at
CLTA
1786.2
397.5
1.00
0.223
standard


200893_at
TRA2B
1403.7
312.8
1.00
0.223
standard


23787_at
MTCH1
1120.0
269.8
1.00
0.241
custom


221767_x_at
HDLBP
1174.4
284.9
1.00
0.243
standard


23191_at
CYFIP1
1345.1
329.4
1.00
0.245
custom


211069_s_at
SUMO1
1111.6
276.2
1.00
0.248
standard


201385_at
DHX15
1529.4
383.5
1.00
0.251
standard


200014_s_at
HNRNPC
1517.7
385.3
1.00
0.254
standard


200667_at
UBE2D3
1090.1
279.3
1.00
0.256
standard


9802_at
DAZAP2
1181.2
303.6
1.00
0.257
custom


200058_s_at
SNRNP200
1104.4
285.9
1.00
0.259
standard


91746_at
YTHDC1
965.1
250.7
1.00
0.260
custom


1315_at
COPB1
1118.2
291.9
1.00
0.261
custom


4714_at
NDUFB8
1219.0
325.5
1.00
0.267
custom


40189_at
SET
1347.9
360.7
1.00
0.268
standard


221743_at
CELF1
1094.0
294.2
1.00
0.269
standard


208775_at
XPO1
940.7
256.1
1.00
0.272
standard


211270_x_at
PTBP1
973.1
266.8
1.00
0.274
standard


211185_s_at
SF3B1
1077.9
297.9
1.00
0.276
standard


10109_at
ARPC2
1357.4
375.9
1.00
0.277
custom


201336_at
VAMP3
959.2
267.4
1.00
0.279
standard


200028_s_at
STARD7
1087.9
303.4
1.00
0.279
standard


22872_at
SEC31A
1040.4
290.5
1.00
0.279
custom







Top 15 RFRS Reference Genes (Set #2)













9927_at
MFN2
207.0
33.1
1.00
0.160
custom


26100_at
WIPI2
216.5
40.3
1.00
0.186
custom


201507_at
PFDN1
260.8
51.2
1.00
0.196
standard


7337_at
UBE3A
225.3
46.5
0.99
0.207
custom


2976_at
GTF3C2
226.3
47.6
1.00
0.210
custom


10657_at
KHDRBS1
776.4
166.3
1.00
0.214
custom


201330_at
RARS
502.6
117.1
1.00
0.233
standard


201319_at
MYL12A
574.4
135.2
1.00
0.235
standard


3184_at
HNRNPD
678.8
160.0
1.00
0.236
custom


10236_at
HNRNPR
570.1
140.4
1.00
0.246
custom


200893_at
TRA2B
1403.7
312.8
1.00
0.223
standard


221619_s_at
MTCH1*
1401.9
342.6
1.00
0.244
standard


208923_at
CYFIP1*
1339.2
333.6
1.00
0.249
standard


201385_at
DHX15
1529.4
383.5
1.00
0.251
standard


4714_at
NDUFB8
1219.0
325.5
1.00
0.267
custom







Oncotype DX ® (Genomic Health, Inc, Redwood City, CA) Reference Genes













213867_x_at
ACTB
19566.3
4360.8
1.00
0.223
standard


200801_x_at
ACTB
17901.0
3995.4
1.00
0.223
standard


2597_at
GAPDH
11873.9
3810.3
1.00
0.321
standard


212581_x_at
GAPDH
11930.9
4172.5
1.00
0.350
standard


217398_x_at
GAPDH
6595.6
2460.2
1.00
0.373
standard


213453_x_at
GAPDH
6695.2
2726.8
1.00
0.407
standard


60_at
ACTB
3786.2
1622.3
1.00
0.428
standard


7037_at
TFRC
781.8
466.6
1.00
0.597
standard


208691_at
TFRC
1035.1
630.8
1.00
0.609
standard


207332_s_at
TFRC
506.9
341.6
0.97
0.674
standard





RPLPO and GUS are also listed as reference genes for the Oncotype DX ® breast cancer assay.













TABLE 4







100 probe sets including all primary, alternate, and excluded genes (k = 20 clusters)












Gene (probe set)
EntrezID
CDF
Varlmp
Predictor group
Predictor status















CCNB2 (9133_at)
9133
custom
0.785
primary
predictor1


MELK (9833_at)
9833
custom
0.739
alternate1
predictor1 alternate1


GINS1 (9837_at)
9837
custom
0.476
alternate2
predictor1 alternate2


RRM2 (6241_at)
6241
custom
0.399
alternate3
predictor1 alternate3


GINS2 (51659_at)
51659
custom
0.354
alternate4
predictor1 alternate4


CCNB1 (214710_s_at)
891
standard
0.140
alternate5
predictor1 alternate5


TOP2A (201291_s_at)
7153
standard
0.590
primary
predictor2


MCM2 (4171_at)
4171
custom
0.428
alternate1
predictor2 alternate1


KIAA0101 (9768_at)
9768
custom
0.409
alternate2
predictor2 alternate2 (excluded)


CDK1 (203213_at)
983
standard
0.379
alternate3
predictor2 alternate3


UBE2C (202954_at)
11065
standard
0.365
alternate4
predictor2 alternate4


TMEM97 (212281_s_at)
27346
standard
0.147
alternate5
predictor2 alternate5


DTL (218585_s_at)
51514
standard
0.130
alternate6
predictor2 alternate6


RACGAP1 (29127_at)
29127
custom
0.588
primary
predictor3


LSM1 (27257_at)
27257
custom
0.139
alternate1
predictor3 alternate1


SCD (200832_s_at)
6319
standard
0.125
alternate2
predictor3 alternate2


HN1 (51155_at)
51155
custom
0.104
alternate3
predictor3 alternate3


CKS2 (1164_at)
1164
custom
0.515
primary
predictor4


NUSAP1 (218039_at)
51203
standard
0.491
alternate1
predictor4 alternate1


PTTG1 (203554_x_at)
9232
standard
0.408
alternate2
predictor4 alternate2 (excluded)


ZWINT (204026_s_at)
11130
standard
0.272
alternate3
predictor4 alternate3


TYMS (7298_at)
7298
custom
0.269
alternate4
predictor4 alternate4


MLF1IP (218883_s_at)
79682
standard
0.204
alternate5
predictor4 alternate5


SQLE (209218_at)
6713
standard
0.174
alternate6
predictor4 alternate6


AURKA (208079_s_at)
6790
standard
0.508
primary
predictor5


PRC1 (9055_at)
9055
custom
0.499
alternate1
predictor5 alternate1


CENPF (207828_s_at)
1063
standard
0.306
alternate2
predictor5 alternate2


ASPM (219918_s_at)
259266
standard
0.293
alternate3
predictor5 alternate3


NEK2 (204641_at)
4751
standard
0.134
alternate4
predictor5 alternate4


ECT2 (1894_at)
1894
custom
0.105
alternate5
predictor5 alternate5


FEN1 (204767_s_at)
2237
standard
0.403
primary
predictor6


FADD (8772_at)
8772
custom
0.313
alternate1
predictor6 alternate1


SMC4 (10051_at)
10051
custom
0.170
alternate2
predictor6 alternate2


SLC35E3 (55508_at)
55508
custom
0.151
alternate3
predictor6 alternate3


TXNRD1 (7296_at)
7296
custom
0.136
alternate4
predictor6 alternate4


RAE1 (211318_s_at)
8480
standard
0.132
alternate5
predictor6 alternate5


ACBD3 (202323_s_at)
64746
standard
0.129
alternate6
predictor6 alternate6


ZNF274 (204937_s_at)
10782
standard
0.122
alternate7
predictor6 alternate7


FRG1 (2483_at)
2483
custom
0.108
alternate8
predictor6 alternate8 (excluded)


LPCAT1 (201818_at)
79888
standard
0.106
alternate9
predictor6 alternate9


EBP (10682_at)
10682
custom
0.341
primary
predictor7


RFC4 (204023_at)
5984
standard
0.264
alternate1
predictor7 alternate1


NCAPG (218662_s_at)
64151
standard
0.234
alternate2
predictor7 alternate2


RNASEH2A (10535_at)
10535
custom
0.205
alternate3
predictor7 alternate3


MED24 (9862_at)
9862
custom
0.191
alternate4
predictor7 alternate4


DONSON (29980_at)
29980
custom
0.186
alternate5
predictor7 alternate5


RMI1 (80010_at)
80010
custom
0.184
alternate6
predictor7 alternate6


PTGES (9536_at)
9536
custom
0.164
alternate7
predictor7 alternate7


C19orf60 (51200_at)
55049
standard
0.151
alternate8
predictor7 alternate8


ISYNA1 (222240_s_at)
51477
standard
0.135
alternate9
predictor7 alternate9


SKP2 (203625_x_at)
6502
standard
0.130
alternate10
predictor7 alternate10


DPP3 (218567_x_at)
10072
standard
0.126
alternate11
predictor7 alternate11 (excluded)


TYMP (204858_s_at)
1890
standard
0.122
alternate12
predictor7 alternate12


SNRPA1 (216977_x_at)
6627
standard
0.116
alternate13
predictor7 alternate13


DHCR7 (201791_s_at)
1717
standard
0.113
alternate14
predictor7 alternate14


TFPT (218996_at)
29844
standard
0.105
alternate15
predictor7 alternate15


CTTN (2017_at)
2017
custom
0.102
alternate16
predictor7 alternate16


MCM5 (216237_s_at)
4174
standard
0.102
alternate17
predictor7 alternate17


TXNIP (10628_at)
10628
custom
0.292
primary
predictor8


SYNE2 (23224_at)
23224
custom
0.270
primary
predictor9


SCARB2 (201646_at)
950
standard
0.225
alternate1
predictor9 alternate1


PDLIM5 (216804_s_at)
10611
standard
0.167
alternate2
predictor9 alternate2


TSC2 (7249_at)
7249
custom
0.145
alternate3
predictor9 alternate3


ELF1 (212420_at)
1997
standard
0.119
alternate4
predictor9 alternate4


DICER1 (23405_at)
23405
custom
0.209
primary
predictor10


CALD1 (201616_s_at)
800
standard
0.129
alternate1
predictor10 alternate1


SOX9 (6662_at)
6662
custom
0.125
alternate2
predictor10 alternate2


FAM20B (202915_s_at)
9917
standard
0.108
alternate3
predictor10 alternate3


APH1A (218389_s_at)
51107
standard
0.099
alternate4
predictor10 alternate4


AP1AR (55435_at)
55435
custom
0.201
primary
predictor11


PDCD6 (222380_s_at)
10016
standard
0.154
alternate1
predictor11 alternate1 (excluded)


PBX2 (202876_s_at)
5089
standard
0.134
alternate2
predictor11 alternate2


WASL (205809_s_at)
8976
standard
0.126
alternate3
predictor11 alternate3


SLC11A2 (203123_s_at)
4891
standard
0.119
alternate4
predictor11 alternate4


KIAA0776 (212634_at)
23376
standard
0.107
alternate5
predictor11 alternate5 (excluded)


C14orf101 (54916_at)
54916
custom
0.101
alternate6
predictor11 alternate6


NUP107 (57122_at)
57122
custom
0.197
primary
predictor12


FAM38A (202771_at)
9780
standard
0.165
alternate1
predictor11 alternate1


PLIN2 (209122_at)
123
standard
0.110
alternate2
predictor12 alternate2


AIM1 (212543_at)
202
standard
0.102
alternate3
predictor12 alternate3


APOC1 (204416_x_at)
341
standard
0.176
primary
predictor13


APOE (203382_s_at)
348
standard
0.121
alternate1
predictor13 alternate1


DTX4 (23220_at)
23220
custom
0.164
primary
predictor14


AQP1 (358_at)
358
custom
0.141
alternate1
predictor14 alternate1


LMO4 (209205_s_at)
8543
standard
0.120
alternate2
predictor14 alternate2


TAF1D (218750_at)
79101
standard
0.159
primary
predictor15 (excluded)


SNORA25 (684959_at)
684959
custom
0.127
alternate1
predictor15 alternate1 (excluded)


FMOD (202709_at)
2331
standard
0.154
primary
predictor16


RGS5 (8490_at)
8490
custom
0.120
alternate1
predictor16 alternate1


PIK3R1 (212239_at)
5295
standard
0.103
alternate2
predictor16 alternate2


MBNL2 (203640_at)
10150
standard
0.100
alternate3
predictor16 alternate3


MAPKAPK2 (201461_s_at)
9261
standard
0.151
primary
predictor17


MTUS1 (212093_s_at)
57509
standard
0.136
alternate1
predictor17 alternate1


DHX9 (212107_s_at)
1660
standard
0.136
alternate2
predictor17 alternate2


PPIF (201490_s_at)
10105
standard
0.115
alternate3
predictor17 alternate3


FOLR1 (211074_at)
2348
standard
0.126
primary
predictor18 (excluded)


KIAA1467 (57613_at)
57613
custom
0.116
primary
predictor19 (excluded)


SUPT4H1 (201483_s_at)
6827
standard
0.111
primary
predictor20


PHB (200658_s_at)
5245
standard
0.106
alternate1
predictor20 alternate1


CD44 (204489_s_at)
960
standard
0.105
alternate2
predictor20 alternate2





Excluded genes are indicated by the notation “(excluded)” in the last column













TABLE 5







90 probe sets (failed probes excluded) including


all primary and alternate genes (k = 8 clusters)











Gene (probe set)
CDF
VarImp
predictor group
predictor status





CCNB2 (9133_at)
custom
0.785
primary
predictor1


MELK (9833_at)
custom
0.739
alternate1
predictor1 alternate1


TOP2A (201291_s_at)
standard
0.590
alternate2
predictor1 alternate2


GINS1 (9837_at)
custom
0.476
alternate3
predictor1 alternate3


MCM2 (4171_at)
custom
0.428
alternate4
predictor1 alternate4


RRM2 (6241_at)
custom
0.399
alternate5
predictor1 alternate5


CDK1 (203213_at)
standard
0.379
alternate6
predictor1 alternate6


UBE2C (202954_at)
standard
0.365
alternate7
predictor1 alternate7


GINS2 (51659_at)
custom
0.354
alternate8
predictor1 alternate8


NCAPG (218662_s_at)
standard
0.234
alternate9
predictor1 alternate9


TMEM97 (212281_s_at)
standard
0.147
alternate10
predictor1 alternate10


CCNB1 (214710_s_at)
standard
0.140
alternate11
predictor1 alternate11


DTL (218585_s_at)
standard
0.130
alternate12
predictor1 alternate12


RACGAP1 (29127_at)
custom
0.588
primary
predictor2


TXNIP (10628_at)
custom
0.292
alternate1
predictor2 alternate1


APOC1 (204416_x_at)
standard
0.176
alternate2
predictor2 alternate2


LSM1 (27257_at)
custom
0.139
alternate3
predictor2 alternate3


SCD (200832_s_at)
standard
0.125
alternate4
predictor2 alternate4


HN1 (51155_at)
custom
0.104
alternate5
predictor2 alternate5


CKS2 (1164_at)
custom
0.515
primary
predictor3


NUSAP1 (218039_at)
standard
0.491
alternate1
predictor3 alternate1


FEN1 (204767_s_at)
standard
0.403
alternate2
predictor3 alternate2


ZWINT (204026_s_at)
standard
0.272
alternate3
predictor3 alternate3


TYMS (7298_at)
custom
0.269
alternate4
predictor3 alternate4


MLF1IP (218883_s_at)
standard
0.204
alternate5
predictor3 alternate5


NUP107 (57122_at)
custom
0.197
alternate6
predictor3 alternate6


SQLE (209218_at)
standard
0.174
alternate7
predictor3 alternate7


SMC4 (10051_at)
custom
0.170
alternate8
predictor3 alternate8


SLC35E3 (55508_at)
custom
0.151
alternate9
predictor3 alternate9


APOE (203382_s_at)
standard
0.121
alternate10
predictor3 alternate10


SUPT4H1 (201483_s_at)
standard
0.111
alternate11
predictor3 alternate11


PLIN2 (209122_at)
standard
0.110
alternate12
predictor3 alternate12


PHB (200658_s_at)
standard
0.106
alternate13
predictor3 alternate13


AURKA (208079_s_at)
standard
0.508
primary
predictor4


PRC1 (9055_at)
custom
0.499
alternate1
predictor4 alternate1


CENPF (207828_s_at)
standard
0.306
alternate2
predictor4 alternate2


ASPM (219918_s_at)
standard
0.293
alternate3
predictor4 alternate3


NEK2 (204641_at)
standard
0.134
alternate4
predictor4 alternate4


DHCR7 (201791_s_at)
standard
0.113
alternate5
predictor4 alternate5


ECT2 (1894_at)
custom
0.105
alternate6
predictor4 alternate6


EBP (10682_at)
custom
0.341
primary
predictor5


FADD (8772_at)
custom
0.313
alternate1
predictor5 alternate1


RFC4 (204023_at)
standard
0.264
alternate2
predictor5 alternate2


RNASEH2A (10535_at)
custom
0.205
alternate3
predictor5 alternate3


MED24 (9862_at)
custom
0.191
alternate4
predictor5 alternate4


DONSON (29980_at)
custom
0.186
alternate5
predictor5alternate 5


RMI1 (80010_at)
custom
0.184
alternate6
predictor5 alternate6


PTGES (9536_at)
custom
0.164
alternate7
predictor5 alternate7


DTX4 (23220_at)
custom
0.164
alternate8
predictor5 alternate8


C19orf60 (51200_at)
standard
0.151
alternate9
predictor5 alternate9


TXNRD1 (7296_at)
custom
0.136
alternate10
predictor5 alternate10


ISYNA1 (222240_s_at)
standard
0.135
alternate11
predictor5 alternate11


RAE1 (211318_s_at)
standard
0.132
alternate12
predictor5 alternate12


SKP2 (203625_x_at)
standard
0.130
alternate13
predictor5 alternate13


ACBD3 (202323_s_at)
standard
0.129
alternate14
predictor5 alternate14


ZNF274 (204937_s_at)
standard
0.122
alternate15
predictor5 alternate15


TYMP (204858_s_at)
standard
0.122
alternate16
predictor5 alternate16


SNRPA1 (216977_x_at)
standard
0.116
alternate17
predictor5 alternate17


LPCAT1 (201818_at)
standard
0.106
alternate18
predictor5 alternate18


TFPT (218996_at)
standard
0.105
alternate19
predictor5 alternate19


CTTN (2017_at)
custom
0.102
alternate20
predictor5 alternate20


MCM5 (216237_s_at)
standard
0.102
alternate21
predictor5 alternate21


SYNE2 (23224_at)
custom
0.270
primary
predictor6


SCARB2 (201646_at)
standard
0.225
alternate1
predictor6 alternate1


PDLIM5 (216804_s_at)
standard
0.167
alternate2
predictor6 alternate2


TSC2 (7249_at)
custom
0.145
alternate3
predictor6 alternate3


AQP1 (358_at)
custom
0.141
alternate4
predictor6 alternate4


ELF1 (212420_at)
standard
0.119
alternate5
predictor6 alternate5


DICER1 (23405_at)
custom
0.209
primary
predictor7


FAM38A (202771_at)
standard
0.165
alternate1
predictor7 alternate1


FMOD (202709_at)
standard
0.154
alternate2
predictor7 alternate2


CALD1 (201616_s_at)
standard
0.129
alternate3
predictor7 alternate3


SOX9 (6662_at)
custom
0.125
alternate4
predictor7 alternate4


RGS5 (8490_at)
custom
0.120
alternate5
predictor7 alternate5


FAM20B (202915_s_at)
standard
0.108
alternate6
predictor7 alternate6


CD44 (204489_s_at)
standard
0.105
alternate7
predictor7 alternate7


PIK3R1 (212239_at)
standard
0.103
alternate8
predictor7 alternate8


AIM1 (212543_at)
standard
0.102
alternate9
predictor7 alternate9


MBNL2 (203640_at)
standard
0.100
alternate10
predictor7 alternate10


APH1A (218389_s_at)
standard
0.099
alternate11
predictor7 alternate11


AP1AR (55435_at)
custom
0.201
primary
predictor8


MAPKAPK2 (201461_s_at)
standard
0.151
alternate1
predictor8 alternate1


MTUS1 (212093_s_at)
standard
0.136
alternate2
predictor8 alternate2


DHX9 (212107_s_at)
standard
0.136
alternate3
predictor8 alternate3


PBX2 (202876_s_at)
standard
0.134
alternate4
predictor8 alternate4


WASL (205809_s_at)
standard
0.126
alternate5
predictor8 alternate5


LMO4 (209205_s_at)
standard
0.120
alternate6
predictor8 alternate6


SLC11A2 (203123_s_at)
standard
0.119
alternate7
predictor8 alternate7


PPIF (201490_s_at)
standard
0.115
alternate8
predictor8 alternate8


C14orf101 (54916_at)
custom
0.101
alternate9
predictor8 alternate9























TABLE 6








ER+/LN−/








Total
untreated*/
Duplicates
ER+/HER−
10 yr
10 yr no


Study
GSE
samples
outcome
removed
array
relapse
relapse






















Desmedt_20071
GSE7390
198
135
135
116
42
60


Ivshina_20062
GSE4922
290
133
2
2
0
2


Loi_20073
GSE6532
327
170
43
40
10
5


Miller_20054
GSE3494
251
132
115
100
30
52


Schmidt_20085
GSE11121
200
 200**
200
155
25
46


Sotiriou_20066
GSE2990
189
113
48
45
12
15


Symmans_20107
GSE17705
298
175
110
102
12
41


Wang_20058
GSE2034
286
209
209
173
67
29


Zhang_20099
GSE12093
136
136
136
125
15
24


9 studies

2175
1403 
998
858
213
274
















TABLE 7







Comparison of validation results in independent test data for full-gene-set,


17-gene and 8-gene RFRS models









Relapse-Free Survival











RFRS Performance
Low risk
Int risk
High risk
















Model
AUC
RR
N (%)
RR
N (%)
RR
N (%)
KM (p)





Full-gene-set
0.730
6.9
78 (30.7)
15.8
133 (52.4)
26.8
43 (16.9)
6.54E−06


17-gene
0.715
7.8
97 (38.2)
15.3
103 (40.5)
26.8
54 (21.3)
9.57E−06


8-gene
0.690
9.7
101 (39.8) 
13.9
105 (41.3)
28.3
48 (18.9)
2.84E−05





RR, relapse rate













APPENDIX 1





Sample patient data (tab-delimited text file: e.g., patient_data.txt)

























TOP2A
MAPKAPK2
SUPT4H1
FMOD
APOC1
FEN1
AURKA
TXNIP
EBP





GSM36893
7.0874
3.9958
7.6561
6.7689
10.268
8.8817
6.6811
8.3538
7.033





















CKS2
DTX4
SYNE2
DICER1
RACGAP1
AP1AR
NUP107
CCNB2







GSM36893
8.0512
6.0171
3.2419
6.272
10.0237
6.3404
8.9953
7.3143

















APPENDIX 2





RFRS algorithm code















library(randomForest)


#Set working directory and filenames for Input/output


setwd(“C:/path/to/RFRS/”)


#The following files should be in the working dir (except the reportfile which will be created by this program)


datafile=“patient_data.txt”


RelapseProbabilityPlotfile=“RelapseProbabilityPlot.Rdata”


RelapseProbabilityFitfile=“RelapseProbabilityFit.Rdata”


reportfile=“patient_results.pdf”


#Load model file, choose (1) OR (2) and comment out the other (contains “rf_model” object)


RF_model_file=“RF_model_17gene_optimized.Rdata” #1


#RF_model_file=“RF_model_8gene_optimized.Rdata” #2


load(file=RF_model_file)


#Read in data (expecting a tab-delimited file with Gene Symbols as colnames and patient_id as rowname)


patient_data=read.table(datafile, header = TRUE, row.names=1, na.strings = “NA”, sep=“\t”)


#Run test data through forest


RF_predictions_response=predict(rf_model, patient_data, type=“response”)


RF_predictions_prob=predict(rf_model, patient_data, type=“prob”)


RFRS=RF_predictions_prob[,“Relapse”]


#Determine RFRS group according to previously determined thresholds


RF_risk_group=RF_predictions_prob[,“Relapse”]


RF_risk_group[RF_predictions_prob[,“Relapse”]<0.333]=“low”


RF_risk_group[RF_predictions_prob[,“Relapse”]>=0.333 & RF_predictions_prob[,“Relapse”]<0.606]=“int”


RF_risk_group[RF_predictions_prob[,“Relapse”]>=0.606]=“high”


#Load existing relapse probability plot, and loess fit to allow current patient to be plotted


load(file=RelapseProbabilityPlotfile)


load(file=RelapseProbabilityFitfile)


RelapseProb=predict(fit, RFRS)


#Create report


pdf(file=reportfile)


replayPlot(RelapseProbabilityPlot)


points(x=RFRS, y=RelapseProb, pch=18, col=“red”,cex=2)


legend_text=c(paste(“Patient: ”, rownames(patient_data)), paste(“RFRS =”, round(RFRS, digits=4)), paste(“risk


group =”, RF_risk_group),


  paste(“Relapse prob. = ”, round(RelapseProb, digits=1), “%”,sep=“”))


legend(x=0.6,y=11,legend=legend_text, bty=“n”,pch=c(18,NA,NA,NA),col=c(“red”,NA,NA,NA),pt.cex=2)


dev.off( )
















APPENDIX 3





Probe sequences for top 100 probesets

















CCNB2 probes (SEQ ID NO: 1-9)



ATGGAGCTGACTCTCATCGACTATG







ATATGGTGCATTATCATCCTTCTAA







AGTCCTCTGGTCTATCTCATGAAAC







CTTGCCTCCCCACTGATAGGAAGGT







CAAAAGCCGTCAAAGACCTTGCCTC







GATTTTGTACATAGTCCTCTGGTCT







GCCACTACACTTCTTAAGGCGAGCA







GATAGGAAGGTCCTAGGCTGCCGTG







ATCCTTCTAAGGTAGCAGCAGCTGC







TOP2A probes (SEQ ID NO: 10-20)



ACTCCGTAACAGATTCTGGACCAAC







GACCAACCTTCAACTATCTTCTTGA







GAAAGATGAACTCTGCAGGCTAAGA







ACAAGATGAACAAGTCGGACTTCCT







TGGCTCCTAGGAATGCTTGGTGCTG







GATATGATTCGGATCCTGTGAAGGC







AAAGAAAGAGTCCATCAGATTTGTG







GAATAATCAGGCTCGCTTTATCTTA







CTTGGTGCTGAATCTGCTAAACTGA







AAGAACAAGAGCTGGACACATTAAA







GAGACTTTTTTGAACTCAGACTTAA







RACGAP1 probes (SEQ ID NO: 21-25)



GTACAACTCGTATTTATCTCTGATG







GAATGTTTGACTTCGTATTGACCCT







GGATGCTGAAATTTTTCCCATGGAA







ACTTCGTATTGACCCTTATCTGTAA







CAATATATCATCCTTTGGCATCCCA







CKS2 probes (SEQ ID NO: 26-28)



CGCTCTCGTTTCATTTTCTGCAGCG







TATTCTTCTCTTTAGACGACCTCTT







TCTCTTTAGACGACCTCTTCCAAAA







AURKA probes (SEQ ID NO: 29-39)



CTACCTCCATTTAGGGATTTGCTTG







GTGTCTCAGAGCTGTTAAGGGCTTA







CCCTCAATCTAGAACGCTACACAAG







GAGGCCATGTGTCTCAGAGCTGTTA







TTAGGGATTTGCTTGGGATACAGAA







GTGCTCTACCTCCATTTAGGGATTT







AAATAGGAACACGTGCTCTACCTCC







GGGATACAGAAGAGGCCATGTGTCT







GAAGAGGCCATGTGTCTCAGAGCTG







CAGAGCTGTTAAGGGCTTATTTTTT







CATTGGAGTCATAGCATGTGTGTAA







FEN1 probes (SEQ ID NO: 40-50)



GAACTTGCTATGTAATTTGTGTCTA







GATGGTGATGTTCACCTGGCAATCA







GAGCCACCAGGAAGGCGCATCTTAG







TTGACCCACCTTGAGAGAGAGCCAC







GGACACTAAGTCCATTGTTACATGA







GAAATGATTTCCTGGCTGGCCAACT







ACACTGGTTTTCATGCGCTGTTTTT







ACTGATTACTGGCTGTGTCTTGGGT







TGGACCTAGACTGTGCTTTTCTGTC







TTGGGTGGGCAGAAACTCGAACTTG







ACCTGGCAATCAGCTGAGTTGAGAC







EBP probes (SEQ ID NO: 51-71)



GAAGGCACTGCTGGGAGCCATTAGA







CAGGCTCATGGGCAGGCACAAGAAG







GTCTTAGTCGTGACCACATGGCTGT







CACAGATACAAGAGAAGCCAGGAGG







AAGGGGCTGTGTGAAGGCACTGCTG







AGAAGAACTGAGGAGTGGTGGACCA







GCCAGGAGGTCTATGATGGTGACGA







CCCACCTGGCATATACTGGCTGGCC







ACATGGCTGTTGTCAGGTCGTGCTG







TCTATGGGGATGTGCTCTACTTCCT







GCATGGAAACCATCACAGCTTGCCT







GAGTGGTGGACCAGGCTCGAACACT







TTGGAGGGACAAAGCTAATTGATCT







GATGCCAAGGCCACAAAAGCCAAGA







CCAGGCTCGAACACTGGCCGAGGAG







TGACAGAGCACCGCGACGGATTCCA







GGGAGCCATTAGAACACAGATACAA







TTTGTCTTCATGAATGCCCTGTGGC







GGAGACCAAGCCTTCTTATCTCAAC







TGCAGTGTGTGGGTTCATTCACCTG







CTCCGCTTCATTCTACAGCTTGTGG







TXNIP probes (SEQ ID NO: 72-102)



TGTGTCAGAGCACTGAGCTCCACCC







TACAAGTTCGGCTTTGAGCTTCCTC







AAAGGATGCGGACTCATCCTCAGCC







ACTTTGTTCACTGTCCTGTGTCAGA







GAAAGGGTTGCTGCTGTCAGCCTTG







AGATAGGGATATTGGCCCCTCACTG







GGCAATCTCCTGGGCCTTAAAGGAT







CTTAGCCTCTGACTTCCTAATGTAG







GCAAAGGGGTTTCCTCGATTTGGAG







AAATGGCCTCCTGGCGTAAGCTTTT







AAACCAACTCAGTTCCATCATGGTG







TTCCACCGTCATTTCTAACTCTTAA







GGTTTTCTCTTCATGTAAGTCCTTG







CGGAGTACCTGCGCTATGAAGACAC







CCCTGCATCCTCAACAACAATGTGC







GTGTTCTCCTACTGCAAATATTTTC







AATTGAGGCCTTTTCGATAGTTTCG







GGAGGTGGTCAGCAGGCAATCTCCT







CCAGCGCCCATGTTGTGATACAGGG







GAAAAACTCAGGCCCATCCATTTTC







TGAGGTGGTCTTTAACGACCCTGAA







TGTTCTTAGCACTTTAATTCCTGTC







AGCTCCACCCTTTTCTGAGAGTTAT







CACTCTCAGCCATAGCACTTTGTTC







GAAGCAGCTTTACCTACTTGTTTCT







GAAGTTACTCGTGTCAAAGCCGTTA







GGTGGATGTCAATACCCCTGATTTA







CCGAGCCAGCCAACTCAAGAGACAA







TGGATGCAGGGATCCCAGCAGTGCA







GATCCTGGCTTGCGGAGTGGCTAAA







GCTGAAACTGGTCTACTGTGTCTCT







SYNE2 probes (SEQ ID NO: 103-113)



TTTCTAAGACTTTTTCACATCCAAA







GTTTTACTCCAATCAGCTGGCAATT







GGCACCCTTAGCTGATGGAAACAAT







ATTTTGAGCTGCCGGTTATACACCA







TGTTCTGTTCAGTACCTAGCTCTGC







GTAAATGCCAAACTACCGACTTGAT







TACGCTTAGAATCAGTTTTACTCCA







GTTCAGAAACTCATAGGCACCCTTA







TGAGCAGTGGTGTCCATCACATATA







ATGTACAACTCAGATGTTTCTCATT







GCTCTGCTCTTTTATATTGCTTTAA







DICER1 probes (SEQ ID NO: 114-142)



AATTTCTTACTATACTTTTCATAAT







ATTTCACCTACCAAAGCTGTGCTGT







ACTAGCTCATTATTTCCATCTTTGG







AAATGATTTTTCACAACTAACTTGT







TTGCAGTCTGCACCTTATGGATCAC







TGATACATCTGTGATTTAGGTCATT







GGAGACGCCAATAGCAATATCTAGG







CTGATGCCACATAGTCTTGCATAAA







AGCTGTGCTGTTAATGCCGTGAAAG







GAAGTGCGCCAATGTTGTCTTTTCT







GTGAAACCTTCATGGATAGTCTTTA







TTTACTAAAGTCCTCCTGCCAGGTA







GGACATCAACCACAGACAATTTAAA







TGTTGCATGCATATTTCACCTACCA







ATAAACCTTAGACATATCACACCTA







TAGTCTTTAATCTCTGATCTTTTTG







GAGACAGCGTGATACTTACAACTCA







GACCATTGTATTTTCCACTAGCAGT







CTGCAGCAGCAGGTTACATAGCAAA







GCCGTGAAAGTTTAACGTTTGCGAT







AACTGCCGTAATTTTGATACATCTG







TATTTACCATCACATGCTGCAGCTG







AACGTTTGCGATAAACTGCCGTAAT







GGAAATTTGCATTGAGACCATTGTA







GCACCTTATGGATCACAATTACCTT







AGAAGCAAAACACAGCACCTTTACC







CCCTTAGTCTCCTCACATAAATTTC







TGTGTAAGGTGATGTTCCCGGTCGC







CTGCCAGGTAGTTCCCACTGATGGA







AP1AR probes (SEQ ID NO: 143-153)



GCCTTCCTTTACCTTGTAGTACAAG







TTTTTCCTCTTGCAACAATGACGGT







GTCAATTTACAAGGCCAGGGATAGA







TTCCACTTCATTTTACATGCCACTA







GTGCTAGACAATTACTGTTCTTTTC







AATATCTATAACTGCATTTTGTGCT







GATAGAAAACACTCCATAATTGCTT







CATTGATTTTATTAAGCCTTCCTTT







TACATGCCACTATATTGACTTTAAT







TCTGGTATGAAAGGCTCCATTGATT







GCTTTCCTTGATTTTGCTGAGGATT







NUP107 probes (SEQ ID NO: 154-163)



GGATATCAGCGTTTCTCTGTGTGCT







GAAAGCTTTGTCTGCCAATGTTGTG







CAGAGAGTCCTCTCTAATGCTCCTA







GATATTGCACAGTACTGGTCAGTAT







GACCAGGGACTTGACCCATTAGGGT







AGATATGGTATCCTCTGAGCGCCAC







AATGCTCCTAGACCAGGGACTTGAC







ATCGTGACACTTTCAACATGTAGGG







TTGGATGCCCTAACTGCTGATGTGA







GTGTTTTCTGCTTCATACGATATTG







APOC1 probes (SEQ ID NO: 164-174)



AAGGGTGACATCCAGGAGGGGCCTC







CAGGAGGGGCCTCTGAAATTTCCCA







GATGCGGGAGTGGTTTTCAGAGACA







CAGCAAGGATTCAGGAGTGCCCCTC







GTGAACTTTCTGCCAAGATGCGGGA







CAAGGCTCGGGAACTCATCAGCCGC







AACACACTGGAGGACAAGGCTCGGG







GACGTCTCCAGTGCCTTGGATAAGC







CCAAGCCCTCCAGCAAGGATTCAGG







TCATCAGCCGCATCAAACAGAGTGA







GTTCTGTCGATCGTCTTGGAAGGCC







DTX4 probes (SEQ ID NO: 175-180)



ATCGCCACCTGGTGCTCATGAGGTG







ACTCGTCTTGGTATTGCACTGTTGT







ATTCTCTTCCCATTTTTGTACATTT







TGCTCCGTGAAAGGACATCGCCACC







GGAGACAAACCTCGTCAGATGCTCA







TGAAGTCTTTGGTGTTGCTCCGTGA







TAF1D probes (SEQ ID NO: 181-191)



TGATTGTTGCCATGTGAGAGTTTTA







ACTCCTAATGTTTGGTGCTATGTTT







GTATGGGTCATTTCAAAGAGGGCTT







TGGTGCTATGTTTTCCTGAGGAGAT







AAGTTTCTCTAGTGTTTTCTGTGGA







GTATTTTTGGCTCGAAGTTTCTCTA







GAAGCCATAGCACTCCTAATGTTTG







AAGAGGGCTTATGAGGCTGTGAAAC







CCCAGAGCTCTTAACGCTGTGACCA







GAGGCTGTGAAACCCAGAGCTCTTA







ATTTCTCTTCTTCAGGGCAAACTTG







FMOD probes (SEQ ID NO: 192-202)



GCTGGGGAGCACTTAATTCTTCCCA







GGAGCTCCGATGTGAGGGGCAAGGC







TCTGGCTGGGGTCCGTGAAGCCCAG







GCCAAACCAGCTCATTTCAACAAAG







ATGTGAACACCATCATGCCTTTATA







TGCCATCACATCCCTGATACTGTGT







TTTGGACTACGTTCTTGGCTCCAGA







GCAGCCAAATCTTGCCTGTGCTGGG







GCTTTGAAGCACCTTCCCTGAGAAG







TCTGCTTTCACATCTCTGAGCTATA







TAATGTTGCCTGGGGCTTAACCCAC







MAPKAPK2 probes (SEQ ID NO: 203-213)



GCTGAAGAGGCGGAAGAAAGCTCGG







CTCCTGCCCACGGGAGGACAAGCAA







CCTGCCCACGGGAGGACAAGCAATA







GGACAAGCAATAACTCTCTACAGGA







AACTCTCTACAGGAATATATTTTTT







GTTGACTACGAGCAGATCAAGATAA







AATGCGCGTTGACTACGAGCAGATC







CACAATGCGCGTTGACTACGAGCAG







GCGCGTTGACTACGAGCAGATCAAG







AAGCAATAACTCTCTACAGGAATAT







AGACAGAACTGTCCACATCTGCCTC







FOLR1 probes (SEQ ID NO: 214-224)



AATCTTTGAGACAAGCATATGCTAC







CGGCCGTGCGTACTTAGACATGCAT







CCATTCGCAGTTTCACTGTACCGGC







GTGCGTACTTAGACATGCATGGCTT







GGAGCGAGCGACCAAAGGAACCATA







GCATATGCTACTGGCAGGATCAACC







AACCATAACTGATTTAATGAGCCAT







GACATGCATGGCTTAATCTTTGAGA







GAGCGACCAAAGGAACCATAACTGA







CAAGTAGGAGAGGAGCGAGCGACCA







AATGAGCCATTCGCAGTTTCACTGT







KIAA1467 probes (SEQ ID NO: 225-235)



TCTCTAATCCCATCCTGAGGTTGCC







GGAAGCTTCATCTGACCAATGTGGG







AAATGCAAGGGTCTTACCCTCCTCT







CCACCCACCCAGGTGTCTAAGATAG







GCAAAGCCAATATGACCACTACTGA







ATCCCCTGAATGTGAATTGCTATCC







AGATAGGACATGCTCCTTTCTTTCT







TTGCTATCCTTATTGCCCTATTAAA







TGGTATGGTGAAACTAATCCCCTGA







TTGCCATCCCCCAAATGTGTGGTAT







CTTGTGAAATGTGTCCCTAAGCCTC







SUPT4H1 probes (SEQ ID NO: 236-246)



TACCCTCCAATTCAGACTCAGCTGA







CAGAACTTCAAATACTTCCTACCCT







CCTGCCCCAAGGAATCGTGCGGGAG







GACAGCTGGGTCTCCAAGTGGCAGC







ATCTTCTTTGGACTACAGGTGGGGT







TAGGATGCTGATTTTCCTACCCGTG







GTATATGACTGCACTAGCTCTTCCT







GAGAGCAGCACATCATTTTATCATT







GTCGAGGAGTGGCCTACAAATCCAG







TGCAAGGCTGCCAGCATCTTTGCTC







ATATGCGGTGTCAGTCACTGGTCGC







MELK probes (SEQ ID NO: 247-257)



AAGACTGTTATGATCGCTTTGATTT







GCCCATCTGTCATTATGTTACTGTC







AGGGCGATGCCTGGGTTTACAAAAG







AGCTCTTAACTATGTCTCTTTGTAA







GATTCTTCCATCCTGCCGGATGAGT







GAATCTAAATCAAGCCCATCTGTCA







GAGCTATCTTAAGACCAATATCTCT







GGAAGACATCCTATCTAGCTGCAAG







GTGTGGGTGTGATACAGCCTACATA







ATGTGGTGGGTATCAGGAGGCAGCG







GGAGGCAGCGGCTTAAGGGCGATGC







MCM2 probes (SEQ ID NO: 258-268)



TTGTGCTTCTCACCTTTGGGTGGGA







GGATGCCTGCGTGTGGTTTAGGTGT







TAGCAGGATGTCTGGCTGCACCTGG







TCTCCACTCAGTACCTTGGATCAGA







GAGTCATGCGGATTATCCACTCGCC







CTGGCATGACTGTTTGTTTCTCCAA







CCCCACTCTCTTATTTGTGCATTCG







AGCACTTGATGAACTCGGGGTACTA







GCCAGTGTGTCTTACTTGGTTGCTG







CCCTCTTGGCGTGAGTTGCGTATTC







TTGGTTGCTGAACATCTTGCCACCT







LSM1 probes (SEQ ID NO: 269-278)



GAAGGACCGAGGTCTTTCCATTCCT







GAGTACTAATCTTTTGCCCAGAGGC







AGTGAAAGTGACATCCTGGCCACCT







ACAGTGGCATAGACTCCTTCACACA







ACAGGGACAGTCTTCATTTACTTGT







TCCATTCCTCGAGCAGATACTCTTG







GCACCAGCAACTACTTCTTTATATT







AAAAGGAGAGTGACACACCCCTCCA







CACCTCACGCATTTGATCACAGACT







CCTTCACACATCACTGTGGCACCAG







NUSAP1 probes (SEQ ID NO: 279-289)



CCTTCACCTCAGTGGAGCTTCTGAG







GGCTTTGCTTAGTATCATGTCCATG







TGTACCTTCGTTCAAATATCCTCAT







CATCTGTCACTCACTATATTCACAA







GTTTTATACTGCTCAAGATCGTCAT







GGGATAGAAAGGCCACCTCTTCACT







AACTGCAGTCTTCTGCTAGCCAATA







ACTCATTCTAACATTGCTTACTTAA







CACCTCTTCACTCTCTATAGAATAT







GCTACATAGCCCTATCGAAATGCGA







TCCTCATGTAATTGCCATCTGTCAC







PRC1 probes (SEQ ID NO: 290-300)



TTGCACATGTCACTACTGGGGAGGT







CCTCTCAATCACTACTCTTCTTGAA







GTTCTCAAAAGCTTACCAGTGTGGA







GTGTTCAGTTCTGTTACACAGTGCA







GAGCTGTCTTTGTCGTGGAGATCTG







ACACAGTGCATTGCCCTTTGTTGGG







ACACATGCTTGTCGGAACGCTTTCT







ACTTGGTGTTAGCCACGCTGTTTAC







GTGTCCGAAGTTGAGATGGCCTGCC







GGGAGTCTGTTTGTTCCAATGGGTT







GGAGATCTGGAACTTTGCACATGTC







FADD probes (SEQ ID NO: 301-311)



GATGAGCAGTCACACTGTTACTCCA







GCACTCTCTAAATCTTCCTTGTGAG







GGATTATGGGTCCTGCAATTCTACA







GAAAGGATGTTTTGTCCCATTTCCT







AATTGCCAAGGCAGCGGGATCTCGT







TCCTCTCTGAGACTGCTAAGTAGGG







TGCTCAACCACTGTGGCGTTCTGCT







TGATTGACACACAGCACTCTCTAAA







CTGGACACTAGGGTCAGGCGGGGTG







AGAGGCCCAGGAATCGGAGCGAAGC







GGGGCAGTGATGGTTGCCAGGACGA







RFC4 probes (SEQ ID NO: 312-322)



TCATGCAGCAACTCAGCTCGTCAAT







ATGTTCAAAATTCCGCTTCAAGCCT







AAAGCGCTACTCGATTAACAGGTGG







ACTCATCAGCCTTTGTGCAACTGTG







GAACATTTGCAACTCATCAGCCTTT







TCAACAGCAGCGATTACTAGACATT







ACCCCTGACCTCTAGATGTTCAAAA







GTGATGCAGCAGTTATCTCAGAATT







GATGGAGTATTTGCTGCCTGTCAGA







AAGCCATTACATTTCTTCAAAGCGC







TCAGCTCGTCAATCAACTCCATGAT







SCARB2 probes (SEQ ID NO: 323-333)



GTGACAATCATTTTGCTGACAGAAT







AAGGGCATTTTCTTTGATTCTCAAA







GGAGCCATCATATGTCACAGTGTTC







AGAGAAACGTGTGCCCTATACTTCC







GAAATCCATCTATCTACAGCCTAAG







TAGCTCACTGTCACTCACTGAATAG







GAGACACCACTTTTCAAAGGACTTC







AGTTCTTTCCAGTGTTTTGTAGCTC







GGACTTCTTGGTTTCAGCATAACCT







GAGAAGCCTATACATTTAGCTGACA







TGCCCTATACTTCCTGTGACAATCA







CALD1 probes (SEQ ID NO: 334-344)



CTTCCCCCACTAAGGTTTGAGACAG







GACGCAGGACGAGCTCAGTTGTAGA







GACGTATCCAGCAAGCGGAACCTCT







TTCAATATCCCAGTAAACCCATGTA







AGCAGTGATACCAACCACATCTGAA







CTTGAGACCAGGAGACGTATCCAGC







ACTGATCATCATAACTCTGTATCTG







GAACCCAAGCTCAAGACGCAGGACG







GCAAGCGGAACCTCTGGGAAAAGCA







GCGGAATGTGTGCAGTATCTAGAAA







TCTGTGGATAAGGTCACTTCCCCCA







PDCD6 probes (SEQ ID NO: 345-355)



GGTTGGTGCAGCAGTCATTAAAAGT







GAGTCAAGGCCAGACTAGATCAGCC







TTCTCATGGAGCTTCCTTTCTAGAG







CAAAGGGGCGTGTCATGTGCCTCAT







CAAGGCCAGACTAGATCAGCCTAAG







CATGGAGCTTCCTTTCTAGAGGGGA







CTCTATTCTCATGGAGCTTCCTTTC







ATTTGAGTAGATTTGGCCTCTATTC







GACTTTCAAAGGGGCGTGTCATGTG







TTGGCCTCTATTCTCATGGAGCTTC







GATTCTAATAGGTTGGTGCAGCAGT







FAM38A probes (SEQ ID NO: 356-366)



GCTACGGCATCATGGGGCTGTACGT







ATCATGGGGCTGTACGTGTCCATCG







CATTATGTTCGAGGAGCTGCCGTGC







GCTGGCGCCCGAGAGGGAAGGAGCC







GCTGGTCATCGGCAAGTTCGTGCGC







GAGGAGTTGTACGCCAAGCTCATCT







CGCTCACCGGAGACCATGATCAAGT







GCGGATTCTTCAGCGAGATCTCGCA







TTCGTGCGCGGATTCTTCAGCGAGA







TCCCCCACGTGTACTGTAGAGTTTT







AGATCTCGCACTCCATTATGTTCGA







APOE probes (SEQ ID NO: 367-377)



GGCCCCTGGTGGAACAGGGCCGCGT







TGGTGGAAGACATGCAGCGCCAGTG







GAAGCGCCTGGCAGTGTACCAGGCC







AGCAGGCCCAGCAGATACGCCTGCA







GTGCCCAGCGACAATCACTGAACGC







TGGGGCCCCTGGTGGAACAGGGCCG







AAGCGCCTGGCAGTGTACCAGGCCG







GCCCAGCGACAATCACTGAACGCCG







GCGCGCGCGGATGGAGGAGATGGGC







GCGACAATCACTGAACGCCGAAGCC







CCCTGGTGGAACAGGGCCGCGTGCG







AQP1 probes (SEQ ID NO: 378-399)



CATAAGTCCTTTCAATTCCACCAGG







GCTAGACAATGATTTGGCCAGGCCT







CAGTGCATCACATCTGCACACTCTC







CTGACCTTGGAATCGTCCCTATATC







TGGAATCGTCCCTATATCAGGGCCT







GCAGCCCCTAAGTGCAAACACAGCA







TCTGCATATATGTCTCTTTGGAGTT







GAAGGCTGGATTCTATCTACATAAG







GCCCTTAACTATCACCAGTGCATCA







CACCACTGTGCACTTAGCCATGATG







ACCACGAGGCTGATTCCTCTCATTT







TGCAAAGTGGCAGGGACCGGCAGAG







GCAAACACAGCATGGGTCCAGAAGA







GCATATATGTCTCTTTGGAGTTGGA







AGACGTGGTCTAGACCAGGGCTGCT







ACTTACTGCCTGACCTTGGAATCGT







GGCCTAGTAACCAAGGCCCTGTCTC







GCATGGGTCCAGAAGACGTGGTCTA







GCATCTGTCTGCTCTGCATATATGT







TCTCAGTTTCTGCCTGGGCAATGGC







TTACTGCCTGACCTTGGAATCGTCC







GCAGGAACTTCTAGCTCATTTAACA







SNORA25 probes (SEQ ID NO: 400-405)



ACTCCTAATGTTTGGTGCTATGTTT







TGGTGCTATGTTTTCCTGAGGAGAT







GAAGCCATAGCACTCCTAATGTTTG







AAGAGGGCTTATGAGGCTGTGAAAC







CCCAGAGCTCTTAACGCTGTGACCA







GAGGCTGTGAAACCCAGAGCTCTTA







RGS5 probes (SEQ ID NO: 406-438)



TGCTCCATTGGAGTAGTCTCCCACC







GGTAGAGGCCTTCTAGGTGAGACAC







TACTTATCTACTGTCCGAAGGCCTT







CCTGCATTTCCCATTAATCTACATA







AATGCTGAGAAATTTGCCACTGGAG







TATACAGTTTAATAAGCCTCTTGCA







ATTTAAAATATTGATCCTTCCCTTG







ATCTCACTTGTTTTAGTTCTGATCC







ATTTGGGTCCAACTTCAATAATGTA







GACTGTGGGTCAAATGTTTCCATTT







AAATGAAACTGTTGCTCCATTGGAG







GTATCTGTAACCACAATCACACATA







GGACCACCTTCATGTTAGTTGGGTA







TTGCAAGTTACTTGTTCTCTCACCT







CTTTTTGCCCACACTGCTTTGGATA







AGATCACCCCTCTAATTATTTCTGA







TATTTCCTCCATAATAACCCTGCAT







GGGATGTTGCTTACTCTTTTTGCCC







GTACTATGTGACTCATGCTTCTGGA







GTTCTCTCACCTGAGGTATTTTTTT







GCCACTGGAGACAAGCAATCTGAAT







TCATCCTGTGAGTTATTTCCTCCAT







TGCAACTAGCAACTCATCTTCGGAA







CTGCCCATAGTCACCAAATTCTGTT







TGGAAAAGGATTCTCTGCCTCGCTT







GCTAATTGTCCTATGATGCTATTAT







TTCCTCTTCTCCCTTTGCAAGAGGA







ATGACATTTATCTTCAAAACACCAA







GAGTAGTCTCCCACCTAAATATCAA







TTCCCACAGCAGCTTTGCTCAGTGA







CTCGCTTTGTGCGCTCTGAGTTTTA







ATCCATTTGTAAGCATTTATCCCAT







ATGTATTTATGCTGCTAGACTGTGG







MTUS1 probes (SEQ ID NO: 439-449)



TCTTCACCACAGACACCTTCTTGTG







GAGCCTAACACTATCCTGTAATTCA







GTCCCTGTCTATACATTCTCTGTAT







TAACCTTTGTAATGTTCTTCACCAC







ACTCTGCTCAGCCCTGTAACAGGGT







TTTTACTTACCCATGTGAGCCTAAC







TTCATTGCCTTTTTCACCTAAGCAT







TTCTCTGTATCTTTTGGGGGTAACT







AGGAAGAGCTTTGACTTGTCCCTGT







GTTTTTCAGTGTTCAGCCATGTCAG







ATTATGATCATCTACCACCAACTCT







PHB probes (SEQ ID NO: 450-460)



GCAGGGGATGGCCTGATCGAGCTGC







TGAGCGACGACCTTACAGAGCGAGC







GACCTTCGGGAAGGAGTTCACAGAA







GAGTTCACAGAAGCGGTGGAAGCCA







CAGCCCCGATGATTCTTAACACAGC







GCAGGTGAGCGACGACCTTACAGAG







CAGGGGATGGCCTGATCGAGCTGCG







GAGCAACAGAAAAAGGCGGCCATCA







TCCTGGATGACGTGTCCTTGACACA







TCGGGAAGGAGTTCACAGAAGCGGT







TGGATGACGTGTCCTTGACACATCT







GINS1 probes (SEQ ID NO: 461-470)



TGTTGAACTTGTATCCTTCAGCCTT







TAATATTGAGTCTTCTGGCCTATAA







GGTCTGTCTTCCTAGGTATTAATGT







AGTTTTCAGTGTACAGGTCTACCAT







GCCTTGCTAAACTGTGAGTTCTCAT







GGCCTATAAACAAGGTCTGTCTTCC







GTAGTCACAGTTACACGGCAGGCTG







GTTGGGCACCTTGATTGAGATTGCA







AATTCTAACCACTTGTTGCTAGTAA







AGGTCTACCATGTCAGCATTTCATA







KIAA0101 probes (SEQ ID NO: 471-490)



AATGGTGCCATATTGTCACTCCTTC







ACCAGCCCAGGCAACATAGCGTAAA







GTGTTTGTTCCAATTAGCTTTGTTG







TAGGTTGTCCCCTAAAGATTCTGAA







TGCTTAGATTGTTGTACTGCTGCCA







TTAAACGGTTGATAATGCCTCTACA







TATTCTACCCTCTTTTTTGGCAAGG







CAAGTCATTGCATTGTGTTCTAATT







CATAGCGTAAACCCTATCTCTAAAA







AACCTTGGATGGATATCTTCTCTTT







ATTGTTGTACTGCTGCCATTTTTAT







CACAGTGGCTTCTCAGGAGGCTGAG







GGATAGAATCATGGTGGGCACAGTG







TCTCCTTGTTTACCCTGGTATTCTA







AAGTGTCTAGTTCTTGCTAAAATCA







TGGAGAATTCTTTAGGTTGTCCCCT







GGAGGGAGGTTTGCTTGAGTCCAGG







TGGCAAGGAGGACAAATACGCAATG







TCATCTTTGAATAACGTCTCCTTGT







GATAATGCCTCTACAACAACAAGAA







SCD probes (SEQ ID NO: 491-501)



TGAACTTGATACGTCCGTGTGTCCC







GGGCAGTTTTGAGGCATGACTAATG







AAAAGCGAGGTGGCCATGTTATGCT







TAACTATAAGGTGCCTCAGTTTTCC







AGATGCTGTCATTAGTCTATATGGT







GGAATTCTCAAGACCTGAGTATTTT







CTGACCTACCTCAAAGGGCAGTTTT







ACAACGCATTGCCACGGAAACATAC







AGCATTTTGGGATCCTTCAGCACAG







GAAGCTAATTGTACTAATCTGAGAT







ATGTCCACCATGAACTTGATACGTC







PTTG1 probes (SEQ ID NO: 502-512)



CATTCTGTCGACCCTGGATGTTGAA







TTGAGAGTTTTGACCTGCCTGAAGA







AATTGCCACCTGTTTGCTGTGACAT







GTGCCTCTCATGATCCTTGACGAGG







TGCAGTCTCCTTCAAGCATTCTGTC







CCTGCCTCAGATGATGCCTATCCAG







AAAACAGCCAAGCTTTTCTGCCAAA







GGGAATCCAATCTGTTGCAGTCTCC







TGAAGAGCACCAGATTGCGCACCTC







AAGCAAAAAGCTCTGTTCCTGCCTC







TTCCCTTCAATCCTCTAGACTTTGA







CENPF probes (SEQ ID NO: 513-523)



GGTCAAAGTTGCTCAGCGGAGCCCA







TGCACAGAAGTTAGCGCTATCCCCA







TACCCCTGGGAGGTGCCAGTCATTG







GTTTGGAAGCACTGATCACCTGTTA







GAAGGCACTTTGTGTGTCAGTACCC







GATCACCTGTTAGCATTGCCATTCC







GAGCCCAGTAGATTCAGGCACCATC







GTACTCTTTAGATCTCCCATGTGTA







TGAGGGTCAAGCGAGGCCGACTTGT







TTGCCATTCCTCTACTGCAATGTAA







CGAAATCCGTCCCAGTCAATAATCT







SMC4 probes (SEQ ID NO: 524-544)



GGACAGTGTTTCAACAAGCCTAGGC







GCATCTAAGGGACTTTGTTGAACTT







GATGGCCTCTGATTTACACTGGTTC







AGAAGTCTGCCCTAGCTGTTAAATT







GAGTTAATTGTTCCTTTCTTCAGTG







TAGACAGCTTGGATCCTTTCTCTGA







GGTTTACCAGGATGTAGTCCCACTG







GAAAACACTTAGTTCATTGGCTTTA







GCGTATTTTTACACTATTGGCTCAA







ATTTACACAGCTAGATTTGGAAGAT







GGATGAGATTGATGCAGCCCTTGAT







ATTGATGCAGCCCTTGATTTTAAAA







AGTTCATAATAATTTCTCTTCGAAA







GGAAGGACTTTCGGTATTGTATTAG







CCTTTCTTCAGTGGGCCATTGTTTT







TTAGTATTTGCTCTTCACCACTACA







GATACCTTGAGTAATGTTTGCCTAT







CACTCCCCTTTACTTCATGGATGAG







AAGCCTAGGCTATCTCGTAAGTTGA







GGACGCCGAACTCGAGCTTGTAGAC







AATATCCCACTATAGTTGCTTCATG







NCAPG probes (SEQ ID NO: 545-555)



CCCAATTTCTCAATGAAGATCTAAG







GATTATGTCCAGTTATTTGCTTTAA







GGTGGAATCCTTTAAGATTATGTCC







AAGACGATGGAGGTGGAATCCTTTA







GAGCCAAAACCGCAGCACTAGAAAA







GAGACTACCAAGACGAGCCAAAACC







TTCCAGAACCAGAATCAGAAATGAA







CCAAGACGAGCCAAAACCGCAGCAC







GGACGAACAGGAGGTGTCAGACTGC







GAAGATGAGACTACCAAGACGAGCC







GTGTCAGACTGCTGAAGCCGACTCT







PDLIM5 probes (SEQ ID NO: 556-566)



CTTGCTTTGTATGCTCAGTGTGTTG







GCCCTCTTTGGTACTATATGCCATG







ACACCTGGCATGACACTTGCTTTGT







GAATTTCCCATAGAAGCTGGTGACA







GAAGCTGGTGACATGTTCCTGGAAG







CAAGAAGGACAAGCCCCTGTGTAAG







TGACATGTTCCTGGAAGCTCTGGGC







ATATGCCATGGATGTGAATTTCCCA







GCCCCTGTGTAAGAAACATGCTCAT







GGAAGGTCAGACCTTTTTCTCCAAG







TTATTATGCCCTCTTTGGTACTATA







SOX9 probes (SEQ ID NO: 567-588)



GAGAGGACCAACCAGAATTCCCTTT







AAGCATGTGTCATCCATATTTCTCT







CTACCTGGAGGGGATCAGCCCACTG







AGTTGAACAGTGTGCCCTAGCTTTT







GGAGAATCGTGTGATCAGTGTGCTA







GTAGTGTATCACTGAGTCATTTGCA







TGGGCTGCCTTATATTGTGTGTGTG







TGTTTTCTGCCACAGACCTTTGGGC







TGTTCTCTCCGTGAAACTTACCTTT







AAATGCTCTTATTTTTCCAACAGCT







CCTAGCTTTTCTTGCAACCAGAGTA







GAATTCCCTTTGGACATTTGTGTTT







GCCAACCTTGGCTAAATGGAGCAGC







ATTACTGCTGTGGCTAGAGAGTTTG







TTGGAGTGAGGGAGGCTACCTGGAG







ATATGGCATCCTTCAATTTCTGTAT







CAGCCCACTGACAGACCTTAATCTT







ATCAGTGGCCAGGCCAACCTTGGCT







TTTTCCAACAGCTAAACTACTCTTA







GCAACTCGTACCCAAATTTCCAAGA







ACATGACCTATCCAAGCGCATTACC







GTAAAAGCTTTGGTTTGTGTTCGTG







PBX2 probes (SEQ ID NO: 589-599)



TAGTTCTCTCCTCACTTGTAAACTT







GTATATGTATCTTCCTCAATTTCCC







GGAGGCAGTGAAGGGCTTGCCCTGC







CATCTTCCCCTGTGAGTGACATGTC







AGGTTGGAAGTGTGATGGGTGGGGG







GGTATCTTTTTGTCACACCAAAATC







CCCCTCCCATTAAAGATCCGGGCAG







AAAGTAACATCAACACTGTCCCATC







GATCCCCTCAGACATTCTCAGGATT







GACTGTCAGAGTGGGGAACCCCTCC







GGGTTGGGGTGCTTGTATATGTATC







PLIN2 probes (SEQ ID NO: 600-610)



TATGTTCTCATTCTATGGCCATTGT







GAGTCTCAGAATGCTCAGGACCAAG







TGTGGCCAGACAGATGACACCTTTT







GTCTGCTCTGGTGTGATCTGAAAAG







GCTTTATCTCATGATGCTTGCTTGT







GGGGTAGAAACTGGTGTCTGCTCTG







CAGGAGACCCAGCGATCTGAGCATA







GAAAAGGCGTCTTCACTGCTTTATC







TATGGCCATTGTGTTGCCTCTGTTA







ATCACTAGTGCATGCTGTGGCCAGA







AACATCTTCATGTGGGCTGGGGTAG







LMO4 probes (SEQ ID NO: 611-621)



CCCTTCCCGCATTTATTGGTGTATT







ACCTTTGTAGCTAGCACCAGTGCCA







TTCATCTCAGATTTGTTCATCACAG







GTCTTCAGTAGACAAGTCACCTTTG







TTAAGGACTCCATGAACCTGGGCTA







TAATGTTGCTACTCCCATGGCAAAG







GTTTTTGTCCTAATGTTGCTACTCC







CAGAGGACATCTTGGGGAGGGGGAG







CACCTTCTTTAGTCTTGATTGCCCT







CCATTGCACCTTCTTTAGTCTTGAT







GATGTGGCTTTTGTGATATTCTATC







PIK3R1 probes (SEQ ID NO: 622-632)



CACGGTCAGTTGTAACTTTGCCTTC







GACTATCCAACTTAACATGAAACTT







GAGATAGCATTAGCTGCCCAGGATG







AATGGAGCTATGTCTTGTTTTAAGT







GAGAGGGAGGATGTCACGGTCAGTT







AGTTGGTCTTTTGACGAGAGGGAGG







GTGCCTCCTTGACATTTCGTTCAAG







GAAACTTGTCACCATGAGATAGCAT







AAAGCTACAATCTGTTCAATGTTTT







CTGCCCAGGATGCTGCTATATATAT







AAAAACTCATTTATACCTGTGTATT







DHX9 probes (SEQ ID NO: 633-643)



TGACCGAGCAGCAGAGTGTAACATC







GTAACATCGTAGTAACTCAGCCCAG







ATCAGTGCGGTTTCTGTGGCAGAGC







GACTTTATCCAGAATGACCGAGCAG







TAGAGGGGCTACTGGATGTGGGAAA







CTCAGCCCAGAAGAATCAGTGCGGT







GGCTTATCCTGAAGTTCGCATTGTT







TGGGAAAACCACACAGGTTCCCCAG







TGTACTGTAGGTGTGCTCCTGAGAA







GCGTGATGTTGTTCAGGCTTATCCT







GGAGGACTTACCCAGTTCAAGAATA







CD44 probes (SEQ ID NO: 644-654)



GGATGGCTTCTAACAAAAACTACAC







GTGTGCTATGGATGGCTTCTAACAA







TAGTTACACATCTTCAACAGACCCC







AGGGTGAAGCTATTTATCTGTAGTA







TTAGGGCCCAATTAATAATCAGCAA







CTTCCATAGCCTAATCCCTGGGCAT







CACATATGTATTCCTGATCGCCAAC







CAGACCCCCTCTAGAAATTTTTCAG







TTGAATGGGTCCATTTTGCCCTTCC







CAGGGTTAATAGGGCCTGGTCCCTG







TTAAACCCTGGATCAGTCCTTTGAT







RRM2 probes (SEQ ID NO: 655-671)



GTATTCAGTATTTGAACGTCGTCCT







GTCTTGCATTGTGAGGTACAGGCGG







TTTTACCTTGGATGCTGACTTCTAA







GTACAGGCGGAAGTTGGAATCAGGT







GACCCTTTAGTGAGCTTAGCACAGC







CCTGGCTGGCTGTGACTTACCATAG







GAACGTCGTCCTGTTTATTGTTAGT







CTCACAACCAGTCCTGTCTGTTTAT







GAAGTGTTACCAACTAGCCACACCA







ATGTGAGGATTAACTTCTGCCAGCT







CTAGCCACACCATGAATTGTCCGTA







CAGCCTCACTGCTTCAACGCAGATT







TTAGGATTCTGTCTCTCATTAGCTG







GTGCTGGTAGTATCACCTTTTGCCA







TATGGTCCTTATATGTGTACAACAT







GAAGATGTGCCCTTACTTGGCTGAT







TAAACAGTCCTTTAACCAGCACAGC







CDK1 probes (SEQ ID NO: 672-682)



TGAAGTATTTTTATGCTCTGAATGT







CAAAGATCAAGGGCTGTCCGCAACA







GATGAATATTTTTCTACTGGTATTT







GACATAGTGTTTATTAGCAGCCATC







GAAAGCTTTTTGTCTAAGTGAATTC







GTGAATTCTTATGCCTTGGTCAGAG







TGTTAACTATACAACCTGGCTAAAG







AAATGTTCTCATCAGTTTCTTGCCA







TGCTAAGTTCAAGTTTCGTAATGCT







AAGGGCTGTCCGCAACAGGGAAGAA







CTTATCTTGGCTTTCGAGTCTGAGT







HN1 probes (SEQ ID NO: 683-691)



GGCCTCTAATATCTTTGGGACACCT







GAAGGAACTCCTCTGAAGCAAGCTC







GACTTGGAGTCATCTGGACTGCAGA







AGAGTGAAGAGAAGCCCGTGCCTGC







GACCCCAACAGCAGGAATAGCTCCC







GTGGTGGATCCAATTTTTCATTAGG







CATCCAGAAGAAATCCCCCTGGCGG







ACAACCACCACCTTCAAGGGAGTCG







GCAAGCTCCGGAGACTTCTTAGATC







ZWINT probes (SEQ ID NO: 692-702)



GATGTACCTTTTTTGTCAACTCTTA







TAGTGATACCTTGATCTTTCCCACT







GTTTCATTGACCTCTAGTGATACCT







GTACAGCCTAGTGTTAACATTCTTG







GATTGGCTTTTGTCATCCACTATTG







AACATTTCTCGATCACTGGTTTCAG







TTTCCCACTTTCTGTTTTCGGATTG







GGCCTCCTATGATGCAGACATGGTG







TCTTGGTATCTTTTTGTGCCTTATC







AGGAGCTGGGACTGGTTTGAACACA







CAGATGGGGAGGGGGTACTGGCCTT







ASPM probes (SEQ ID NO: 703-713)



ATAGAGCCTCTGATGTACGAAGTAG







CAGTCTCTACAAACTTACAGCTCAT







AATCCCCTGCAAGCTATTCAAATGG







GAAGAAATCACAAATCCCCTGCAAG







GTGATGGATACGCTTGGCATTCCTT







GCATTCCTTTTATCCCAGAAACACC







GTTGTTTGTTGGCTATTTTACTGAA







GGAGCTTTTGCAGATATACCGAGAA







TCAGATATGCTGTGCAAGTCTTGCT







GTTGTTGACCGTATTTACAGTCTCT







GTTGTAATCGCAGTATTCCTTGTAT







SLC35E3 probes (SEQ ID NO: 714-723)



AGTAGCTCTCTGCTTGCTGATAGAT







ATTTTAGTTTAGCTTCCTGATTTAT







GATGGTTTCCCAGTGTGAGATTTGT







ATGGTTTGGTTGGTCCCAGCAAAGT







GTTCTCGGTGTTCAAACTCTTCTAA







TGCAAATATGCTGTGGGTTCTCGGT







AAGGAATTGCTGTTACTGTACTGCA







TAAATCTAGTGTTTCTATTTTAGTT







ATATACCATCCCATATATATGTGGG







CTGCTTGCTGATAGATGGTTTCCCA







RNASEH2A probe (SEQ ID NO: 724-731)



ATGCCAGAGACATACCAGGCGCAGC







GGAAGATCACATCCTACTTCCTCAA







ACTGATTATGGCTCAGGCTACCCCA







TGCAGCAAAGTTTTCCCGGGATTGA







CTCAATGAAGGGTCCCAAGCCCGTC







TCGTGGACACCGTAGGGATGCCAGA







GAGGACTCAGCATCCGAGAATCAGG







TCTTCCCACCGATATTTCCTGGAAC







TSC2 probes (SEQ ID NO: 732-738)



TGGCCTCACAGGTGCATCATAGCCG







GGGCAACGACTTTGTGTCCATTGTC







CCCTGATGCCCACCAAGGACGTGGA







AGCACCGCTGCGACAAGAAGCGCCA







TCAACTTTGTCCACGTGATCGTCAC







GTGAGGACTTCAAGCTTGGCACCAT







TGGACTACGAGTGCAACCTGGTGTC







FAM20B probes (SEQ ID NO: 739-749)



GTATTAATAAGGCATTGCCCCCTGT







TGTAAGGCTGCATTGTGGGTTTGGG







GTTTTGTAACACTGTCCTACTTTAT







AGTCGTTGCAGGGTTTGGATCAGCT







GGATCAGCTGTAAGTTAGGTATGCC







AAGGCTGTTACAATCAAGTCGTTGC







AGATGAGTCCTATACGTGGCAATTT







TCTGAAGCCAGCATTATCTTCCAAA







GCCCCCTGTTTGCACTCAGGGTTAA







CGTGGCAATTTTTCAATGTCATCTG







TCAGAACTCATGGCCATTTCCTGCC







WASL probes (SEQ ID NO: 750-760)



GAGATACTTGTCAAGTTGCTCTTAA







TGGGCCGTCGACAAAGGAAATCTGA







AATTTCGAAAAGCAGTTACAGACCT







AATCTACCCATGGCTACAGTTGATA







GTGGGAACAAGAGCTATACAATAAC







TAGTCCTAGAGGATATTTTCATACC







GATCCCCCAAATGGTCCTAATCTAC







AGAAAAGACGAGATCCCCCAAATGG







TACCCATGGCTACAGTTGATATAAA







TTCATACCTTTGCTGGAGATACTTG







ATACCTTTGCTGGAGATACTTGTCA







AIM1 probes (SEQ ID NO: 761-771)



GCCTGTGCTGAACTGATCTCTTAAA







AATACTGGTGCTCTTGTCACAGGTA







AAAATGCTGATCTTCTCTGGAGTCT







TGCTCTTCCAACAGTGGGTTCTAGC







ACAACTGACAAGACACCAGCCCATA







TTAGGCCTTTTGTGCATACCATTAC







GGTGCATGTACAACAGCATCCAACA







TCTTTTTGTCCTCATCACTCAATAC







GCAATCTTGGAATCCTCAACTGCAG







GGTAGAACAGCTTGTTTCTTTTCCA







GCATCCAACATATCTGTCTTGTTCC







MBNL2 probes (SEQ ID NO: 772-782)



TTCAGCCCTTTAATAATGGAGCATC







TTTACTATGATATCCATTTTCCAGA







GAGACTAACTCTCCACTTGTATGGG







GGGAACTACATTTCACTCTTGGTTT







ACCTGTAACCCCAAGCAAATATAGA







AACTCTCCACTTGTATGGGAACTAC







TCAGGATATAACAGCACTTCACCGA







ATTTCACTCTTGGTTTTCAGGATAT







TAACAGCACTTCACCGAAATATTCT







TATTAGCACACAACTATTTTCAGCC







AAGTTTGTTTATATTCAGAAGTCTG







PPIF probes (SEQ ID NO: 783-793))



GCTGAAGGCAGATGTCGTCCCAAAG







TGGCAAGCATGTTGTGTTCGGTCAC







GCCTGAAACGATACGTGTGCCCACT







TAATGCTGGTCCTAACACCAACGGC







CCGCTTTCCTGACGAGAACTTTACA







GTGGCCAGGGTGCTGGCATGGTGGC







AGAAGGGCTTCGGCTACAAAGGCTC







AAGTCCATCTACGGAAGCCGCTTTC







TGTCCTGTCCATGGCTAATGCTGGT







GTTCGGTCACGTCAAAGAGGGCATG







GACTGTGGCCAGTTGAGCTAATCTG







GINS2 probes (SEQ ID NO: 794-803)



TACAGCAGGAGTGGCCATGTGGTCC







GATGAGGTACTCGTGGTTCTGGAGC







GGCAGATGGTGCAGCCAACAATGCT







AACAATGCTGACCGGTGCTTATCCT







GGACATTCTTCAATTCCACATCTGT







GGGCTGAATTTAGACTCTCTCACAG







CAGCCTCTGGAGAGTACTCAGTCTC







AAATAAGTCATTCTCCCTAGCAGAG







CTCTAAGCCCTGATCCACAATAAAA







GGAGCTCTAGAAACACTTCTGATGC







UBE2C probes (SEQ ID NO: 804-814)



TATAAGCTCTCGCTAGAGTTCCCCA







GAGCTCTGGAAAAACCCCACAGCTT







GGAAAAGTGGTCTGCCCTGTATGAT







GGGTAACATATGCCTGGACATCCTG







CAATGCGCCCACAGTGAAGTTCCTC







GGGTAGGGACCATCCATGGAGCAGC







GCCTTCCCTGAATCAGACAACCTTT







TCCATCCAGAGCCTTCTAGGAGAAC







ATGATGTCAGGACCATTCTGCTCTC







AGATGGTCTGTCCTTTTTGTGATTT







TGATAGTCCCTTGAACACACATGCT







TYMS probes (SEQ ID NO: 815-825)



GAGGGTATCTGACAATGCTGAGGTT







GCATTTCAATCCCACGTACTTATAA







ATCTGTCCGTGACCTATCAGTTATT







GTACAATCCGCATCCAACTATTAAA







AAATGGCTGTTTAGGGTGCTTTCAA







TCACAAGCTATTCCCTCAAATCTGA







AACTGTGCCAGTTCTTTCCATAATA







GAGGGAGCTGAGTAACACCATCGAT







GGGGTTGGGCTGGATGCCGAGGTAA







AAAGCTCAGGATTCTTCGAAAAGTT







GAACTAGGTCAAAAATCTGTCCGTG







NEK2 probes (SEQ ID NO: 826-837)



ATTAATACCATGACATCTTGCTTAT







GCTGTAGTGTTGAATACTTGGCCCC







GCCATGCCTTTCTGTATAGTACACA







TGAGCTGTCTGTCATTTACCTACTT







GTAGCACTCACTGAATAGTTTTAAA







TTGGTTGGGCTTTTAATCCTGTGTG







CTCTGTAGTTCAAATCTGTTAGCTT







GATATTTCGGAATTGGTTTTACTGT







AAATATTCCATTGCTCTGTAGTTCA







TGAATACTTGGCCCCATGAGCCATG







GGTATGCTTACAATTGTCATGTCTA







TXNRD1 probes (SEQ ID NO: 839-844)



ACCTGTATTTCTCAGTTGCAGCACT







CCCATGCATCTGCCTGGCATTTAGG







GCAATTGAGGCAGTTGACCATATTC







TCCTCATCTCATTTGGCTGTGTAAA







CCTGCCAGCAGTTCTTGAAGCTTCT







TCCAAGTCCACCAGTCTCTGAAATT







TGGCATTTAGGCAGCAGAGCCCCTG







GGAGTGGAATGTTCTATCCCCACAA







MED24 probes (SEQ ID NO: 845-854)



CAGCCCAGGAGTAGTCTTACCTCTG







CTTACCTCTGAGGAACTTTCTAGAT







GGCTGCTAAAGCCATTGCTGCACTC







GATGAGGCATCGTGCCTCACATCCG







TAAAGCCATTGCTGCACTCTGAGGG







GTGAGGGAAATCTACCTTCGTTCAT







TGGTGCAAGAGCCTCTAGCGGCTTC







TTCTTCTTTCAAAATTTCCTCTCCA







AACTGGTGAAGGTGTCAGCCATGTC







CATCCGCTCCACATGGTGCAAGAGC







ELF1 probes (SEQ ID NO: 855-865)



AGATGACATGGTTGTTGCCCCAGTC







AAATATGCAGACTCACCGGGAGCCT







CATGTTCCTGGTGCTGATATTCTCA







TTATGCCGGTCTAGCCTGTGTGGAA







TCACCCATGTGTCCGTCACATTAGA







GATATTCTCAATAGTTATGCCGGTC







GATGAACGACAGCTTGGTGATCCAG







TTGGTGATCCAGCTATTTTTCCTGC







GGCACTCCTCAATATGGATTCCCCT







GCTGCCTGATACGTGAATCTTCTTG







GACATCACCCTTACAGTTGAAGCTT







APH1A probes (SEQ ID NO: 866-876)



GTGCATGTTTGGGAACTGGCATTAC







TTCTCAGTACTCCCTCAAGACTGGA







ACTCCAGAGCTGCAGTGCCACTGGA







GTGCCACTGGAGGAGTCAGACTACC







ATAGATGAGCTCTGAGTTTCTCAGT







GTCAGACTACCATGACATCGTAGGG







TCTTCTAACCTCCTTGGGCTATATT







CAGGCCTGAGGGGGAACCATTTTTG







GGACATCTTGGTCTTTTTCTCAGGC







TGCTGAGGGTGGAGTGTCCCATCCT







GAGGTATATTGGAACTCTTCTAACC







SLC11A2 probes (SEQ ID NO: 877-887)



ACTGACCATACATTTTTCTTAGCCC







GACATTTTTACATACCGAGCCTGAG







TTCATCTGAGCCCCCAAAAGCATTA







TTCTTAGCCCCTCAAGTAATATAGC







AAACTGGTCATAAAGGCACTCTGTG







CCTTCCAGAGTCCTGGCTGATTGGT







GGGACTGACATCTTAAGCTCTCACC







TACACTACTGTGTTTCACTGACCAT







TGATTGGTGTTCGCTGTTCATCTGA







GGACTTCTCATTTTTGGAGCTTTCC







GAATGACAATTCCCCTAACCATTCC







CCNB1 probes (SEQ ID NO: 888-898)



TTTGCACTTCCTTCGGAGAGCATCT







CTTCCAGTTATGCAGCACCTGGCTA







CAACATTACCTGTCATATACTGAAG







TGGACACCAACTCTACAACATTACC







TGGCACCATGTGCCATCTGTACATA







ACTATGACATGGTGCACTTTCCTCC







GGAACTAACTATGTTGGACTATGAC







GAATCTCTTCTTCCAGTTATGCAGC







GCACCTGGCTAAGAATGTAGTCATG







TCCTCCTTCTCAAATTGCAGCAGGA







AGATTGGAGAGGTTGATGTCGAGCA







TMEM97 probes (SEQ ID NO: 899-909)



ATTCCATCAGCTTTCTCTAAGTCTT







ATAGAGGGTCTCTTCACGTTGATGC







ATCCACTGTGTGCATAGAGGGTCTC







TCAGCTTTCTCTAAGTCTTTGCTCA







AAGTTCAACCTTAAAATGATGTTAG







TCTCTTCACGTTGATGCTTGGCATT







GCCAGGCATAACATATCCACTGTGT







TCACGTTGATGCTTGGCATTCCATC







GTGTGCATAGAGGGTCTCTTCACGT







TACAGCCAGGCATAACATATCCACT







CCATCAGCTTTCTCTAAGTCTTTGC







MLF1IP probes (SEQ ID NO: 910-920)



AGGCCATAATCATCTTTTCTGGTTA







AAATAGCATCAGTTTGTCCAATAGT







ATGTTGACACCTTAATCGGTCCCAG







GAAGCTCCTTGACCAGGGATGAGAA







CAACCATCAGTTAGAGAAGCTCCTT







AAGAACACTTCTGGGAGCCGAAAGC







GTGCCTATAGGAAGACTAGTCTCAT







GCCATCTGCGAAATATCAACCATCA







AAACGTATGATTCATCCAGCCTTCC







ATCGGTCCCAGGTATGAGCTATAAT







GGAGCCGAAAGCCATCTGCGAAATA







ECT2 probes (SEQ ID NO: 921-931)



AGACTGTTTGTACCCTTCATGAAAT







TCACAATAGCCTTTTTATAGTCAGT







GTAAATGACTCTTTGCTACATTTTA







GCATGTTCAACTTTTTATTGTGGTC







GGTAATTTTATCCACTAGCAAATCT







GCATAGATATGCGCATGTTCAACTT







GAAGTTGCCATCAGTTTTACTAATC







CAGTTTTACTAATCTTCTGTGAAAT







TAGCTGTTTCAGAGAGAGTACGGTA







TTCCTATTTCTTTAGGGAGTGCTAC







GTATGTGCCACTTCTGAGAGTAGTA







RAE1 probes (SEQ ID NO: 932-942)



AGCCTTGTTGGGTTGTCAGCCATGG







AACAGTTAGATCAGCCCATCTCAGC







ATGGCACCCTTGCAACTGTGGGATC







AGAAGTAGTGGCTGGAGACTCTGGC







CTGCCTCATCTCTGTACGAATTTGG







GATGGTAGATTCAGCTTCTGGGACA







CTCAGCTTGCTGTTTCAATCACAAT







AATGGAATCGCGTTCCATCCTGTTC







TGGATTTCAACCCCTGGAGAAAACG







ACGAATTTGGGTCCCAGCCTTGTTG







CATATTTGCATACGCTTCCAGCTAC







DONSON probes (SEQ ID NO: 943-949)



GGAAATCACATAGCAGTTACCCCAT







GTGGTGCTGAGAGACTACATTTATA







ATACCGTTACTTGGGAAATCATCTT







CAACTGCTGTATTTAACATCTGCCT







GATATCTATATCCCTACAACCTAAT







TAACTGTGGTTTGCACCCTAACACT







GTTACCCCATGGCATTGTGACTAAT







KIAA0776 probes (SEQ ID NO: 950-960)



CCACCTCATACACACACAATTCAGT







GTCATATAGTAAGCATTTTCCCCCA







CATATTTCAGGTTTGTTCTCTTTCC







AAAGTAGTCACTATACAACTCCCCT







AAGACCTTGTTCTCAAATCTAGGAA







GAAGGATTCATTTGTTGCGAGTGCC







GTTGCGAGTGCCAGTATACTTAAAT







TTTTCCAATCCATTTATCCCTGGGG







CCTGATTTTCACACAAATACTATAT







GTAAATTGGGTTCTTCATGGAAGTT







GGAAATCATCTGTGACGGAAGAGTA







>DTL_probe1 (SEQ ID NO: 961)



ATGATTTTGTTTGTATCCCTACCCA







DTL probes (SEQ ID NO: 962-971)



GGAAGCCATAGAATTGCTCTGGTCA







AGCACACCATAGCCTTAACTGAATA







TGGGTGCCAAAGGTCAACTGTAATG







GACCAATATCTGCCAGTAACGCTGT







AATTGGGATACATTTGGCTGTCAGA







TAACGCTGTTTATCTCACTTGCTTT







GACAACTTTTTAATTCCTTTGATCT







GTCTACTGGGTATAACATGTCTCAC







GGAAATCTGCCTAATCTGCTTATAT







GCTCTGGTCAAAACCAAGCACACCA







SQLE probes (SEQ ID NO: 972-982)



GATTCCCTGCATCAACTAAGAAAAG







TTGGTGGCGAATGTGTTGCGGGTCC







GCGTGTTCTGTAATATTTCCTCTAA







TCTCCTAACCCTCTAGTTTTAATTG







TCTCAGTAGTGGTGCTGTATTGTAC







AATTGGACACTTCTTTGCTGTTGCA







CTTGGATTACAAAACCTCGAGCCCT







CTGTTGGGCTGCTTTCTGTATTGTC







ATCTATGCCGTGTATTTTTGCTTTA







TTGCTGTTGCAATCTATGCCGTGTA







ATAGCATAGTACCATACCACTTATA







ACBD3 probes (SEQ ID NO: 983-993)



GAACGCAGAGCGACTCGAGGTGTCC







GGCAAAGCATTTCATCCAACTTATG







GCCTGGAGGAGTTGTACGGCCTGGC







GCAGCAGCCGGAGATGGCGGCGGTG







TTAAATAGGTGTTGCCATCTCTTTT







GACACTTGTCCTGAGGTTGGATTCT







GGCAGCCCTGGGAAACATGTCTAAA







TCAACATATGTTGCGTCCCACAAAA







TGGCACTGCGCTTCTTCAAAGAAAA







TAAGCAAGTTCTTATGGGCCCATAT







GGCCCATATAATCCAGACACTTGTC







RMI1 probes (SEQ ID NO: 994-1004)



CCCTGAACATGCCTGAGCTTGTCAT







TCCCTTTCTGAATTAGCTGTACATA







GCATTTATCTATGTCTTTAGGTGTC







GGTAATTTCCTTCTAATATGTTGGT







TACCATTCTTCCACTGTGCTGTTAT







GTAGATCAGAACATCAGGCTTTCAG







ATGCCTGAGCTTGTCATAATATGTT







ATGTTGGTACTGTCTATGGCCATAC







TTAGGTGTCATTGTTCCCTTTCTGA







AGAGGGACTGTTTACCATTCTTCCA







GCTGTACATATAAGCCTTCCTTTGG







C14orf101 probes (SEQ ID NO: 1005-1015)



CAGAAAGCACCGAATGACCCACAGC







TTCCGTCTGTACTCTCAGAAAGCAC







GAAGTGCTGTTATCGGAAACCATCA







GACTTCCAGTCTTTCACCAGATGAC







GTATCATGGTCCAGCAGTACTGTTT







TATCCCGTGTTACCAAATTACCATT







GAAACCATCAGACATTTCCGTCTGT







ATGAAAAACCTGCTCATCGTTCAGC







TTAAAACTAAGTCATCTCCCAGATA







GCAAAATTCTGTTTATCCCGTGTTA







CTCATCGTTCAGCTTCCAAAATTCT







ZNF274 probes (SEQ ID NO: 1016-1026)



CCTTTTCAGCTTGACCCTGCAATAT







AATCTGCACTGATATTACATCCACA







ACCTCATAGCTCTCAAGCCAGTTGA







AGGAGACTGCCCAGCACATAATGAA







GATATTGTTTGTTCACTCATTTAGT







ACATGCACAGGCCTGCTTGTGAATC







GCCAGTTGAAGAAACCTTGCCTTTT







AAGATTTCCCATTCACTTGATATTG







TACATCCACAGTACCACAGTATTTA







GAAGAAACAGCCTACCTCATAGCTC







GAGCGCCCATATGCATGCAACAAAT







PTGES probes (SEQ ID NO: 1027-1048)



TGGATGTCTTTGCTGCAGTCTTCTC







GGTCTTGGGTTCCTGTATGGTGGAA







CAAAGGAACTTTCTGGTCCCTTCAG







TTGGCCACCAGACCATGGGCCAAGA







CAAAGGGCAGTGGGTGGAGGACCGG







TCTCCTAGACCCGTGACCTGAGATG







CGTGGCTATACCTGGGGACTTGATG







CAGCCACTCAAAGGAACTTTCTGGT







GGTTTGGAAACTGCAAATGTCCCCT







AGGTTTGAGTCCCTCCAAAGGGCAG







GGCCCACCGGAACGACATGGAGACC







TCTCTGGGCACAGTGGGCCTGTGTG







CTCTGGGCACAGTGGGCCTGTGTGT







TTTGGATGTCTTTGCTGCAGTCTTC







ACCTGGGGACTTGATGTTCCTTCCA







GGCTATACCTGGGGACTTGATGTTC







CTCCTAGACCCGTGACCTGAGATGT







TGGAGGACCGGGAGCTTTGGGTGAC







CACCAGACCATGGGCCAAGAGCCGC







GCAGTGGGTGGAGGACCGGGAGCTT







TTTCTGGTCCCTTCAGTATCTTCAA







CACCGGAACGACATGGAGACCATCT







FRG1 probes (SEQ ID NO: 1049-1053)



GGCTCGGAAAGATGGATTTTTGCAT







GCAGTTTTCGGCTGTCAAATTATCT







GGGCGTTCAGATGCAATTGGACCAA







TAGTCCTCCAGAGCAGTTTTCGGCT







ATTGCCCTGAAGTCTGGCTATGGAA







C19orf60 probes (SEQ ID NO: 1054-1069)



GCACGGTGGCCCTGCTGCAGTTGAT







GGCGATCAGCGAGGTTCTCCAGGAC







GGACCTTAGGTTTGATGCGGAATCT







GGTTCTCCAGGACCTTAGGTTTGAT







CAGCGAGGTTCTCCAGGACCTTAGG







AATTAAAACCATGGAGGCGATCAGC







CCTTAGGTTTGATGCGGAATCTGCC







CGCTGTACGGATGCAGCAGCTGAAA







TCTGCCGAGTGATGGCGGCTCCCCA







CATGGAGGCGATCAGCGAGGTTCTC







GTTTGATGCGGAATCTGCCGAGTGA







ACTGCGCTGCTGACCTTCCTGCAGT







ATTAAAACCATGGAGGCGATCAGCG







CCAGGACCTTAGGTTTGATGCGGAA







AGTGCACGGGGTGACCCAGGCCTTC







ATCTGCCGAGTGATGGCGGCTCCCC







LPCAT1 probes (SEQ ID NO: 1070-1080)



TGTGTGTGAGACAGGACGCAGCGGG







CAGACCCGTGGGCAGGTGGGGCATG







GTTGAGTTAAACCCCTTGTGTGTGA







TCCCTTCCGCAGGTCTGCAGATGAA







AATTTCAGGGCTCTTGGCGTGTTGG







TGAAATGCCACTGCGCATTTTCAGA







TCTTTTCTCTTCGTGGCGACTTAGA







GCCTTTGGTAGCTAACAGTCACTGA







AGAAATCCTAGTGCAGCCTTTGGTA







TGAATGGATGTTTGTTCCTCCTGAT







GAGTTGGCGGATATTCGGAACTGTG







ISYNA1 probes (SEQ ID NO: 1081-1091)



TACCTCGGAGCTGATGCTGGGCGGA







ACCAATGGCTGCACCGGTGATGCCA







CAACACGTGTGAGGACTCGCTGCTG







GCCTCAAGCGAGTTGGACCCGTGGC







GCCACCTACCCTATGTTGAACAAGA







GGAACCAACACACTGGTGCTGCACA







GTGAGCTTCTGCACTGACATGGACC







AACCACATGCTCCTGGAACACAAAA







GGGCATCTGCAAGAGGAGCCCCCAA







CAAAATGGAGCGCCCAGGGCCCAGC







CCAGCGCAGCTGCATCGAGAACATC







SKP2 probes (SEQ ID NO: 1092-1102)



AAATTGATGACTTGTTCGTATGTTC







GAAGTGCCTTTATCTGCTTAGACCT







TGCCCTCAAACATACAGAACTTCCA







CTCTGACATCGGATGCCCTCAAACA







AGCTATTTTGCCAACATGTCAGAGT







AGAACTTCCAAACTCAAGTCCAGCC







AAGTCCAGCCATAAGCTATTTTGCC







AGAGCTGGGGTTAGGATCCGGTTGG







TAGGATCCGGTTGGACTCTGACATC







AAAGCTAACACCAGTCATTTATATT







GATGATGCTTCAATTTCTTAATAGT







DPP3 probes (SEQ ID NO: 1103-1113)



AAACGTTCTCACCAAATCCAATGCT







ATACGAGGCGTCAGCTGCTGGCCTC







AGGAGCTTGGACCTTGGTACTACCT







GATGCCCGATTCTGGAAGGGCCCCA







CAGACCAAGGCTGCAAGTGGCCCTC







GCTTACCATCCTGTCTACCAGATGA







CTCTGTGATCTCATTTCATCTGCAC







GTGGCACGTGACAGCTAGGGTTCAA







TGAGCGTTTCCCAGAGGATGGACCC







TGAGGGTGGTGACACAACCCCTTCC







TCATCTGCACTGCCATACGTGGAGT







TYMP probes (SEQ ID NO: 1114-1124)



CCTGTGCTCGGGAAGTCCCGCAGAA







CCTTGGCCGCTTCGAGCGGATGCTG







CCGCTTCGAGCGGATGCTGGCGGCG







TGGCCCGAGCCCTGTGCTCGGGAAG







CCGAGCCCTGTGCTCGGGAAGTCCC







TGCTCGGGAAGTCCCGCAGAACGCC







CTGCTGGTCGACGTGGGTCAGAGGC







CTGGTCGACGTGGGTCAGAGGCTGC







GCCCGCCAGACTTAAGGGACCTGGT







CAGGCCCGCCAGACTTAAGGGACCT







ACTTAAGGGACCTGGTCACCACGCT







SNRPA1 probes (SEQ ID NO: 1125-1135)



GGTTGCTGCAGTCTGGTCAGATCCC







AGCTGACGGCGGAGCTGATCGAGCA







GTGGGCCATCTCCAGGGGATGTAGA







GTCAGATCCCTGGCAGAGAACGCAG







TAGCAAATGCTTCAACTCTGGCTGA







TGATCGAGCAGGCGGCGCAGTACAC







AAGGTTCCGCAAGTCAGAGTACTGG







TGGCATCTCTCAAATCGCTGACTTA







TCCAGGTGCTGGTTTGCCAACTGAC







TCCGGGGGTGATCTGAACCCTCTGG







AACGCAGATCAGGGCCCACTGATGA







DHCR7 probes (SEQ ID NO: 1136-1146)



TCTCCAGCGAGGAGGTCTCAGTCCC







GCGTGCACGGTGTTGAACTGGGACA







CTATGCTCCGAGTAGAGTTCATCTT







CTCCTTGGTAGCGTGCACGGTGTTG







TGACTGTGCAGACTCTGGCTCGAGC







AGGTGTAGGCAGGTGGGCTCTGCTT







GAAAGGGGCTTTCATGTCGTTTCCT







TCTTCCTCATCCCTAGGGTGTTGTG







GAACTCTTTTTAAACTCTATGCTCC







GTCTGCAGACCTCAGAGAGGTCCCA







GAACTGGGACACTGGGGAGAAAGGG







TFPT probes (SEQ ID NO: 1147-1157)



AAAGTACCAGGCACTAGGTCGGCGC







GCGCTGCCGGGAGATCGAGCAGGTG







TGGCCCCGGTGCAGATTAAGGTTGA







CAGCCAGTTCACCATTGTGCTGGAG







GCCGAGCAGGAAATGCGCTGACTCC







CCTGGATTCCAGTTGGGTTTCTCGG







GCTGGACTCCTACGGGGATGACTAC







AACGAGCGGGTCCTGAACAGGCTCC







TCGGGGTCCAGACAAACTGCTGCCC







GGTTCCTCATGAGAGTGCTGGACTC







GGCGGCGCCAGCGGGAATTAAATCG







CTTN probes (SEQ ID NO: 1158-1174)



TGTGTCTTTCCAGAAGGTCACGTGG







CAAAGATGGGGTGCCAAGACGGTGC







TCGCCCAGGATGACGCGGGGGCCGA







GTGGAAATGTCTCGGGACTTGGGTC







CGTGAACAGCCTTTTATCTCCAAGC







GAAACTCATCTCCTTCCTGAGGAGC







GAATTTCGTGAACAGCCTTTTATCT







CCAGGACACCGCTGTCCTGGCATTT







CAGCCTTTTATCTCCAAGCGGAAAG







TTCCTCATTGGATTACTGTGTTTTA







GAAGGTCACGTGGAAATGTCTCGGG







CTGGGAGACCGACCCTGATTTTGTG







AATCAGTCCCCAATGCCTGGAAATT







GCCTGGAAATTCCTCATTGGATTAC







CCTGAGGTGCATTTTCTCATCATCC







CATCCTTGCTTTACCACAATGAGCA







ATTTGTGGCCACTCACTTTGTAGGA







MCM5 probes (SEQ ID NO: 1175-1185)



GCATCGCATGCAGCGCAAGGTTCTC







GAGGAAGGAGCTGTAGTGTCCTGCT







CTGGGAAGTGTGCTTTTGGCATCCG







CGGCGAGATCCAGCATCGCATGCAG







CCAGCATCGCATGCAGCGCAAGGTT







CTGCCTGCCATTGACAATGTTGCTG







GCGAGATCCAGCATCGCATGCAGCG







GTTCTGGGAAGTGTGCTTTTGGCAT







GAAGGAGCTGTAGTGTCCTGCTGCC







TCGCATGCAGCGCAAGGTTCTCTAC







TTGACAATGTTGCTGGGACCTCTGC

















APPENDIX 4





Probe sequences for 17-gene and 8-gene


panel of Tables 1 and 2.

















CCNB2 probes (SEQ ID NO: 1-9)



ATGGAGCTGACTCTCATCGACTATG







ATATGGTGCATTATCATCCTTCTAA







AGTCCTCTGGTCTATCTCATGAAAC







CTTGCCTCCCCACTGATAGGAAGGT







CAAAAGCCGTCAAAGACCTTGCCTC







GATTTTGTACATAGTCCTCTGGTCT







GCCACTACACTTCTTAAGGCGAGCA







GATAGGAAGGTCCTAGGCTGCCGTG







ATCCTTCTAAGGTAGCAGCAGCTGC







TOP2A probes



(SEQ ID NO: 10-17 and SEQ ID NO: 19-20)



ACTCCGTAACAGATTCTGGACCAAC







GACCAACCTTCAACTATCTTCTTGA







GAAAGATGAACTCTGCAGGCTAAGA







ACAAGATGAACAAGTCGGACTTCCT







TGGCTCCTAGGAATGCTTGGTGCTG







GATATGATTCGGATCCTGTGAAGGC







AAAGAAAGAGTCCATCAGATTTGTG







GAATAATCAGGCTCGCTTTATCTTA







AAGAACAAGAGCTGGACACATTAAA







GAGACTTTTTTGAACTCAGACTTAA







RACGAP1 probes (SEQ ID NO: 21-25)



GTACAACTCGTATTTATCTCTGATG







GAATGTTTGACTTCGTATTGACCCT







GGATGCTGAAATTTTTCCCATGGAA







ACTTCGTATTGACCCTTATCTGTAA







CAATATATCATCCTTTGGCATCCCA







CKS2 probes (SEQ ID NO: 26-28)



CGCTCTCGTTTCATTTTCTGCAGCG







TATTCTTCTCTTTAGACGACCTCTT







TCTCTTTAGACGACCTCTTCCAAAA







AURKA probes (SEQ ID NO: 29-39)



CTACCTCCATTTAGGGATTTGCTTG







GTGTCTCAGAGCTGTTAAGGGCTTA







CCCTCAATCTAGAACGCTACACAAG







GAGGCCATGTGTCTCAGAGCTGTTA







TTAGGGATTTGCTTGGGATACAGAA







GTGCTCTACCTCCATTTAGGGATTT







AAATAGGAACACGTGCTCTACCTCC







GGGATACAGAAGAGGCCATGTGTCT







GAAGAGGCCATGTGTCTCAGAGCTG







CAGAGCTGTTAAGGGCTTATTTTTT







CATTGGAGTCATAGCATGTGTGTAA







FEN1 probes (SEQ ID NO: 40-50)



GAACTTGCTATGTAATTTGTGTCTA







GATGGTGATGTTCACCTGGCAATCA







GAGCCACCAGGAAGGCGCATCTTAG







TTGACCCACCTTGAGAGAGAGCCAC







GGACACTAAGTCCATTGTTACATGA







GAAATGATTTCCTGGCTGGCCAACT







ACACTGGTTTTCATGCGCTGTTTTT







ACTGATTACTGGCTGTGTCTTGGGT







TGGACCTAGACTGTGCTTTTCTGTC







TTGGGTGGGCAGAAACTCGAACTTG







ACCTGGCAATCAGCTGAGTTGAGAC







EBP probes (SEQ ID NO: 51-71)



GAAGGCACTGCTGGGAGCCATTAGA







CAGGCTCATGGGCAGGCACAAGAAG







GTCTTAGTCGTGACCACATGGCTGT







CACAGATACAAGAGAAGCCAGGAGG







AAGGGGCTGTGTGAAGGCACTGCTG







AGAAGAACTGAGGAGTGGTGGACCA







GCCAGGAGGTCTATGATGGTGACGA







CCCACCTGGCATATACTGGCTGGCC







ACATGGCTGTTGTCAGGTCGTGCTG







TCTATGGGGATGTGCTCTACTTCCT







GCATGGAAACCATCACAGCTTGCCT







GAGTGGTGGACCAGGCTCGAACACT







TTGGAGGGACAAAGCTAATTGATCT







GATGCCAAGGCCACAAAAGCCAAGA







CCAGGCTCGAACACTGGCCGAGGAG







TGACAGAGCACCGCGACGGATTCCA







GGGAGCCATTAGAACACAGATACAA







TTTGTCTTCATGAATGCCCTGTGGC







GGAGACCAAGCCTTCTTATCTCAAC







TGCAGTGTGTGGGTTCATTCACCTG







CTCCGCTTCATTCTACAGCTTGTGG







TXNIP probes (SEQ ID NO: 72-102)



TGTGTCAGAGCACTGAGCTCCACCC







TACAAGTTCGGCTTTGAGCTTCCTC







AAAGGATGCGGACTCATCCTCAGCC







ACTTTGTTCACTGTCCTGTGTCAGA







GAAAGGGTTGCTGCTGTCAGCCTTG







AGATAGGGATATTGGCCCCTCACTG







GGCAATCTCCTGGGCCTTAAAGGAT







CTTAGCCTCTGACTTCCTAATGTAG







GCAAAGGGGTTTCCTCGATTTGGAG







AAATGGCCTCCTGGCGTAAGCTTTT







AAACCAACTCAGTTCCATCATGGTG







TTCCACCGTCATTTCTAACTCTTAA







GGTTTTCTCTTCATGTAAGTCCTTG







CGGAGTACCTGCGCTATGAAGACAC







CCCTGCATCCTCAACAACAATGTGC







GTGTTCTCCTACTGCAAATATTTTC







AATTGAGGCCTTTTCGATAGTTTCG







GGAGGTGGTCAGCAGGCAATCTCCT







CCAGCGCCCATGTTGTGATACAGGG







GAAAAACTCAGGCCCATCCATTTTC







TGAGGTGGTCTTTAACGACCCTGAA







TGTTCTTAGCACTTTAATTCCTGTC







AGCTCCACCCTTTTCTGAGAGTTAT







CACTCTCAGCCATAGCACTTTGTTC







GAAGCAGCTTTACCTACTTGTTTCT







GAAGTTACTCGTGTCAAAGCCGTTA







GGTGGATGTCAATACCCCTGATTTA







CCGAGCCAGCCAACTCAAGAGACAA







TGGATGCAGGGATCCCAGCAGTGCA







GATCCTGGCTTGCGGAGTGGCTAAA







GCTGAAACTGGTCTACTGTGTCTCT







SYNE2 probes (SEQ ID NO: 103-113)



TTTCTAAGACTTTTTCACATCCAAA







GTTTTACTCCAATCAGCTGGCAATT







GGCACCCTTAGCTGATGGAAACAAT







ATTTTGAGCTGCCGGTTATACACCA







TGTTCTGTTCAGTACCTAGCTCTGC







GTAAATGCCAAACTACCGACTTGAT







TACGCTTAGAATCAGTTTTACTCCA







GTTCAGAAACTCATAGGCACCCTTA







TGAGCAGTGGTGTCCATCACATATA







ATGTACAACTCAGATGTTTCTCATT







GCTCTGCTCTTTTATATTGCTTTAA







DICER1 probes (SEQ ID NO: 114-142)



AATTTCTTACTATACTTTTCATAAT







ATTTCACCTACCAAAGCTGTGCTGT







ACTAGCTCATTATTTCCATCTTTGG







AAATGATTTTTCACAACTAACTTGT







TTGCAGTCTGCACCTTATGGATCAC







TGATACATCTGTGATTTAGGTCATT







GGAGACGCCAATAGCAATATCTAGG







CTGATGCCACATAGTCTTGCATAAA







AGCTGTGCTGTTAATGCCGTGAAAG







GAAGTGCGCCAATGTTGTCTTTTCT







GTGAAACCTTCATGGATAGTCTTTA







TTTACTAAAGTCCTCCTGCCAGGTA







GGACATCAACCACAGACAATTTAAA







TGTTGCATGCATATTTCACCTACCA







ATAAACCTTAGACATATCACACCTA







TAGTCTTTAATCTCTGATCTTTTTG







GAGACAGCGTGATACTTACAACTCA







GACCATTGTATTTTCCACTAGCAGT







CTGCAGCAGCAGGTTACATAGCAAA







GCCGTGAAAGTTTAACGTTTGCGAT







AACTGCCGTAATTTTGATACATCTG







TATTTACCATCACATGCTGCAGCTG







AACGTTTGCGATAAACTGCCGTAAT







GGAAATTTGCATTGAGACCATTGTA







GCACCTTATGGATCACAATTACCTT







AGAAGCAAAACACAGCACCTTTACC







CCCTTAGTCTCCTCACATAAATTTC







TGTGTAAGGTGATGTTCCCGGTCGC







CTGCCAGGTAGTTCCCACTGATGGA







AP1AR probes (SEQ ID NO: 143-153)



GCCTTCCTTTACCTTGTAGTACAAG







TTTTTCCTCTTGCAACAATGACGGT







GTCAATTTACAAGGCCAGGGATAGA







TTCCACTTCATTTTACATGCCACTA







GTGCTAGACAATTACTGTTCTTTTC







AATATCTATAACTGCATTTTGTGCT







GATAGAAAACACTCCATAATTGCTT







CATTGATTTTATTAAGCCTTCCTTT







TACATGCCACTATATTGACTTTAAT







TCTGGTATGAAAGGCTCCATTGATT







GCTTTCCTTGATTTTGCTGAGGATT







NUP107 probes (SEQ ID NO: 154-163)



GGATATCAGCGTTTCTCTGTGTGCT







GAAAGCTTTGTCTGCCAATGTTGTG







CAGAGAGTCCTCTCTAATGCTCCTA







GATATTGCACAGTACTGGTCAGTAT







GACCAGGGACTTGACCCATTAGGGT







AGATATGGTATCCTCTGAGCGCCAC







AATGCTCCTAGACCAGGGACTTGAC







ATCGTGACACTTTCAACATGTAGGG







TTGGATGCCCTAACTGCTGATGTGA







GTGTTTTCTGCTTCATACGATATTG







APOC1 probes (SEQ ID NO: 164-174)



AAGGGTGACATCCAGGAGGGGCCTC







CAGGAGGGGCCTCTGAAATTTCCCA







GATGCGGGAGTGGTTTTCAGAGACA







CAGCAAGGATTCAGGAGTGCCCCTC







GTGAACTTTCTGCCAAGATGCGGGA







CAAGGCTCGGGAACTCATCAGCCGC







AACACACTGGAGGACAAGGCTCGGG







GACGTCTCCAGTGCCTTGGATAAGC







CCAAGCCCTCCAGCAAGGATTCAGG







TCATCAGCCGCATCAAACAGAGTGA







GTTCTGTCGATCGTCTTGGAAGGCC







DTX4 probes (SEQ ID NO: 175-180)



ATCGCCACCTGGTGCTCATGAGGTG







ACTCGTCTTGGTATTGCACTGTTGT







ATTCTCTTCCCATTTTTGTACATTT







TGCTCCGTGAAAGGACATCGCCACC







GGAGACAAACCTCGTCAGATGCTCA







TGAAGTCTTTGGTGTTGCTCCGTGA







FMOD probes (SEQ ID NO: 192-202)



GCTGGGGAGCACTTAATTCTTCCCA







GGAGCTCCGATGTGAGGGGCAAGGC







TCTGGCTGGGGTCCGTGAAGCCCAG







GCCAAACCAGCTCATTTCAACAAAG







ATGTGAACACCATCATGCCTTTATA







TGCCATCACATCCCTGATACTGTGT







TTTGGACTACGTTCTTGGCTCCAGA







GCAGCCAAATCTTGCCTGTGCTGGG







GCTTTGAAGCACCTTCCCTGAGAAG







TCTGCTTTCACATCTCTGAGCTATA







TAATGTTGCCTGGGGCTTAACCCAC







MAPKAPK2 probes (SEQ ID NO: 203-213)



GCTGAAGAGGCGGAAGAAAGCTCGG







CTCCTGCCCACGGGAGGACAAGCAA







CCTGCCCACGGGAGGACAAGCAATA







GGACAAGCAATAACTCTCTACAGGA







AACTCTCTACAGGAATATATTTTTT







GTTGACTACGAGCAGATCAAGATAA







AATGCGCGTTGACTACGAGCAGATC







CACAATGCGCGTTGACTACGAGCAG







GCGCGTTGACTACGAGCAGATCAAG







AAGCAATAACTCTCTACAGGAATAT







AGACAGAACTGTCCACATCTGCCTC







SUPT4H1 probes (SEQ ID NO: 236-246)



TACCCTCCAATTCAGACTCAGCTGA







CAGAACTTCAAATACTTCCTACCCT







CCTGCCCCAAGGAATCGTGCGGGAG







GACAGCTGGGTCTCCAAGTGGCAGC







ATCTTCTTTGGACTACAGGTGGGGT







TAGGATGCTGATTTTCCTACCCGTG







GTATATGACTGCACTAGCTCTTCCT







GAGAGCAGCACATCATTTTATCATT







GTCGAGGAGTGGCCTACAAATCCAG







TGCAAGGCTGCCAGCATCTTTGCTC







ATATGCGGTGTCAGTCACTGGTCGC

















APPENDIX 5





Probe sequences for top 25 reference probesets


(set #1) and top 15 reference probesets (set #2).


Overlapping probesets listed only once.

















MYL12B probes (SEQ ID NO: 1186-1189)



GTTACATTGTCTTACTCTCTTTTAC







GTTACATTGTCTTACTCTCTTTTAC







GAGGCCCCAGGGCCAATCAATTTCA







GTACCATTCAGGAAGATTACCTAAG







SFRS3 probes (SEQ ID NO: 1190-1200)



GAAACACAGGCCATCAGGGAAAACG







GAAAAATCCAACTCTCATCCTGGGC







CATCCTGGGCAGAGGTTGCCTAGTT







GATACATGGCTGTTCGTGACATTCT







AATGTCCTGCCAGTTTAAGGGTACA







GGGTACATTGTAGAGCCGAACTTTG







GAGCCGAACTTTGAGTTACTGTGCA







TACTTTACAATGTTCCCTTAAGCAA







GATAATAAACCTCTAAACCTGCCCA







AACCTGCCCAGCGGAAGTGTGTTTT







TACTTTTTTTTCCATAGCTGGGATA







CLTA probes: (SEQ ID NO: 1201-1211)



CAAGAGTAGCCTCAACCTGTGCTTC







CAGGGTGGCAGATGAAGCTTTCTAC







ACAACCCTTCGCTGACGTGATTGGT







ACCATCCTTGCTACAGCCTAGAACA







TGACATTGACGAGTCGTCCCCAGGC







TAACCCCAAGTCTAGCAAGCAGGCC







GCAAGCAGGCCAAAGATGTCTCCCG







GCCACCCTGTGGAAACACTACATCT







ATCTGCAATATCTTAATCCTACTCA







GAAGCTCTTCACAGTCATTGGATTA







TGTTTGTGATTGCATGTTTCCTTCC







TRA2B probes: (SEQ ID NO: 1212-1222)



TACTTTTCTTTCTAACATATCAATG







ATACCATACTTATATACCTGCAACT







ATGCTCTGTAACTCTGTACTGCTAG







AATACAGCCAGTGCTTAATGCTTAT







AATGTGGATTTGTCGGCTTTTATGT







GCAAGTGACAATACATTCCACCACA







AATACACTCTTGTTCTTCTAGCTTT







AAACCGGGTGCTTCAAAGTACATGA







GGAACACTATACCTGTCATGGATGA







GGATGAACTGAAGACTTTGCCTGTT







GGAGGCCCAATTTCACTCAAATGTT







MTCH1 probes: (SEQ ID NO: 1223-1232)



GTTTTTCTCAACACTACTTTTCTGA







GCTCAGCTGGGAGCATCATTCTCCT







GCTCAGCTGGGAGCATCATTCTCCT







GAGAATGGCTTATGGGGGCCCAGGT







GTTTAATGGTGATGCCTCGCGTACA







TCTCTAGTCCTACCCAGTTTTAAAG







GCCTCGCGTACAGGATCTGGTTACC







GTTGGGCAGATCAGTGTCTCTAGTC







CACCATCATGTCTAGGCCTATGCTA







GACCTCATCTCCCGCAAATAAATGT







HDLBP probes (SEQ ID NO: 1233-1243)



AACGCCCGCAGCACAACGAAGAGGC







CCGCAGCACAACGAAGAGGCCAATG







AAGAGGCCAATGGGCACTCTTCCAG







CACTCTTCCAGAGGCTTTGTGGTGC







TCCAGAGGCTTTGTGGTGCGGGACC







ACCTGCTCCACTGTTTAACACTAAA







AACCAAGGTCATGAGCATTCGTGCT







TAAGATAACAGACTCCAGCTCCTGG







TAGGATTCCACTTCCTGTGTCATGA







CCACTTCCTGTGTCATGACCTCAGG







GACCTCAGGAAATAAACGTCCTTGA







CYFIP1 probes: (SEQ ID NO: 1244-1254)



TGGGATGTTCTGGCAGCTGTGTCAT







TTGTTGCCATCACGTTCCTACAAAA







GCCTTTCTCTCCGTAAACTATTTAG







AATAGTGAACTTGATTCCCCTGCTT







ATGCTGCTGGGTTCATTCATTCATT







CTGCTTCCACTAAATCCAGTTGTGA







GCACTCCGTAACTCAACATGGCATG







GAGAATATTGGCTGCTGATTGTTGC







GTTTAGGGATCTTTCTGATGGTCTT







TTTTCAGTATCTCTGTACCTGTTAA







CTTAGTTCTAAGTCATTGTTCCCAT







SUMO1 probes: (SEQ ID NO: 1255-1265)



AAATCTTGTCAGAAGATCCCAGAAA







AAAGTTCTAATTTTCATTAGCAATT







ATTTGTACTTTTTGGCCTGGGATAT







GCCTGGGATATGGGTTTTAAATGGA







AATGGACATTGTCTGTACCAGCTTC







CATTGTCTGTACCAGCTTCATTAAA







AATGACCTTTCCTTAACTTGAAGCT







GACCTTTCCTTAACTTGAAGCTACT







GAGGGTCTGGACCAAAAGAAGAGGA







AGGTGAGAGTAATGACTAACTCCAA







CTAACTCCAAAGATGGCTTCACTGA







DHX15 probes: (SEQ ID NO: 1266-1276)



AAGTTCGAGTTGTGCTCTTCACGTT







TGTGCTCTTCACGTTGGTTCGATAA







CACGTTGGTTCGATAATGGCCTTTA







GTAAATATTCCATTCTGATTTCATA







ATTAAACATTTATGCCTCCCTTTTG







CCTCCCTTTTGTGTTGACACTGTAG







GTTGACACTGTAGCTCATACTGGAA







GTGATTATCGACCATGGTATGCATG







GGTATGCATGATCGTTGTAATTGTT







TTTTTTGTTTCAGTACCAGAGGCAC







GTACCAGAGGCACTGACTTCAATAA







HNRNPC probes: (SEQ ID NO: 1277-1287)



AATAATCTCTTGTTATGCAGGGAGT







TATGCAGGGAGTACAGTTCTTTTCA







TCTTTTCATTCATACATAAGTTCAG







TAAGTTCAGTAGTTGCTTCCCTAAC







GTTGCTTCCCTAACTGCAAAGGCAA







ACTGCAAAGGCAATCTCATTTAGTT







GAGTAGCTCTTGAAAGCAGCTTTGA







AGAAGTATGTGTGTTACACCCTCAC







TGCTGTGTGGGGCAGTTCAACACAA







GTTGGCATGTCAAATGCATCCTCTA







ACAGCCTGATGTTTGGGACCTTTTT







UBE2D3 probes: (SEQ ID NO: 1288-1298)



GTTGGGATTTGCTTCATTGTTTGAC







TGCACAGTCTGTTACAGGTTGACAC







TTGACACATTGCTTGACCTGATTTA







TAGTGTAGCTTTAATGTGCTGCACA







GTGCTGCACATGATACTGGCAGCCC







ACTGGCAGCCCTAGAGTTCATAGAT







GTTCATAGATGGACTTTTGGGACCC







GGGACCCAGCAGTTTTGAAATGTGT







GCAGCCCCTGTCTAACTGAAATTTC







CTAACTGAAATTTCTCTTCACCTTG







CTTCACCTTGTACACTTGACAGCTG







DAZAP2 probes: (SEQ ID NO: 1299-1324)



AGAGTGTCTGATGCGGCCACTCATT







TGAAGCCGCCCTAAGGATTTTCCTT







GGGGAACTTCTTCATGGGTGGTTCA







TTGTGTGTTCTGTACATGTGATGTT







CTCCCAATGCTGCTCAGCTTGCAGT







GAGGAGGATGCATTTCAAAAGCTTG







GATGTCGTGCAAACTGTACTGTGAA







ATAGGTTGTCTCTGCATACACGAAC







GATTCTTTACTTAGCTTGTTTTTAG







ATTTATATCCCATCTAGAATTCAGC







TGCAGTCATGCAGGGAGCCAACGTC







GTGGTGCACTTAACTTGTGGAATTT







GTTTGACTGTACCATTGACTGTTAT







GATGAAGTTGCATTACACCTCACTG







AAGTTCAGCGTTGTATGTCTCTCTC







AATACTGTACCATACTGGTCTTTGC







TCTTTCTGGTGCCCAAACTTTCAGG







TACACGAACCTAACCCAAATTTGCT







GAATTCAGCTAGGTGCTGCTGCTGC







ACGTCCTCGTAACTCAGCGGAAGGG







CTCTCTCTACACTGTGGTGCACTTA







AATGACTTGAGTCCAGTGAAATCTC







TAGCAGTACCTCCCTAAAGCATTTT







ACACCTCACTGCAAGGATTCTTTAC







GGCTCCCCAGAATTCCTAGACTGGG







CTCTGTTTCCTTTGATGACGCTTTG







SNRNP200 probes: (SEQ ID NO: 1325-1335)



GAAGTCACAGGCCCTGTCATTGCGC







GATGCCAAGTCCAATAGCCTCATCT







CCCACAACTACACTCTGTACTTCAT







GGAGTACAAATTCAGCGTGGATGTG







GATTCAGATTGAGTCCTGAGGCATT







GTAGGAATCCTGGTTGTGGGGACCA







ACTCTGGATCCAGTGACAGCAGGTG







ACAGCAGGTGTCATGGGTCAAGCAT







AATCATATATAGCATTTTCAGGCAT







GGCATGTTCCTGGTAGTTCTTTTGA







CTGGTAGTTCTTTTGAGTCTGACAT







YTHDC1 probes: (SEQ ID NO: 1336-1346)



GTATGATGGTTTGACTGTATGGCAG







GGATCTTGATTGATAACTGCCATGA







GTGTGTTCATCCTAGAGTTATTTTT







CCTTCCCCTCCAAATTGTATACATT







TGTTGTAGCAGCCTCTTGTTTTTTT







GCGTGGCAGCGGAAGACGATTCCCA







AATTCCTATGTTCAGTAGCGTGGTT







AACTGCCATGATATTTTGCTTTGAT







ACATGTAGTTGCACACGGTTCAGTA







TTTCCTCAGTCTTCAATGACGAGAG







ATGTTCACAACTTGCGTGCGTGGCA







COPB1 probes: (SEQ ID NO: 1347-1368)



AGTCCTTGAAGCTTTACAGTTAATT







ACCTTTATGCTCGTTCCATATTTGG







TATGGCAGCCAACCTTTATGCTCGT







GTTTCATGTACCAAGACCCTTTTCA







GTTTGTCTTTTGTCTTAACAGTTCT







GAATGCTGTCCTCAAAGTATATAAT







TGCTGTCCTCAAAGTATATAATGTT







GATGCACTTGCAAATGTCAGCATTG







ACCAAGACCCTTTTCACAGTACAAT







GAATACTTTTCAGCCAATAATTTAT







GACCCTTTTCACAGTACAATAAACA







TAATGTTTCATGTACCAAGACCCTT







CTGCTGTTACCGGCCATATAAGAAT







CATGTACCAAGACCCTTTTCACAGT







AAATGACTACTTACAGCACATATTA







GGTATGGGCTTACTGGACTCCAACA







TAACAGTTCTGAATGCTGTCCTCAA







GTCTTTTGTCTTAACAGTTCTGAAT







GAAGCCAATTCACCAGGGACCAGAT







ACTCCAACATCTTTTGTACTCTTTC







AGTTCTGAATGCTGTCCTCAAAGTA







GCCCTTTCTGGTTACTGTGGCTTTA







NDUFB8 probes: (SEQ ID NO: 1369-1385)



GAGATCTGAGGAGGCTTCGTGGGCT







GCACTGGCACCTAGACATGTACAAC







CGGGTGGTTCACTATGAGATCTGAG







TGGTATAGCTGGGACCAGCCGGGCC







TGGGTCCTCTAACTAGGACTCCCTC







CCCTGACCGCTCACAGCATGAGAGA







GCCGGGCCTGAGGTTGAACTGGGGT







CGGTGATCCCTCCAAAGAACCAGAG







TACGAACCTTACCCGGATGATGGCA







ACACCTGTTTCTTGGCATGTCATGT







AACCGTGTGGATACATCCCCCACAC







TCATGTGCTGGGTGGGGGACGTGTA







TCTAACTAGGACTCCCTCATTCCTA







GGACTCCCTCATTCCTAGAAATTTA







CATGACCAAGGACATGTTCCCGGGG







GATCCCTCCAAAGAACCAGAGCGGG







CCAGAGCGGGTGGTTCACTATGAGA







SET probes: (SEQ ID NO: 1386-1401)



ATTGGCCTTTTACCTGGATATAAAT







ACCATCCAACAGACCTGGTGCTCTA







CCATCCAACAGACCTGGTGCTCTAA







TGCTCTAATGCCAAGTTATACACGG







ATAGGCTCTCAGTAAGAAGTCTGAT







GGTATAAAGCTCTCAAATGTGACCA







AAGCTCTCAAATGTGACCATGTGAA







TAATGGACTCAGCTCTGTCTGCTCA







AATGCCATTGTGCAGAGAAGCACCC







GAAGCACCCTAATGCATAAGCTTTT







CTAATGCATAAGCTTTTTAATGCTG







AATTAAATGCCACTTTTTCAGAGGT







CCACTTTTTCAGAGGTGAATTAATG







TAAATGGAACTATTCCATCAATAGG







CACTGTATACCGATCAGGAATCTTG







ATACCGATCAGGAATCTTGCTCCAA







CELF1 probes: (SEQ ID NO: 1402-1412)



TTGCCACTATGACCAAACGCACAGT







AAACGCACAGTCTGTTCTGCAGCAA







CTGCAGCAACAACGGGATTCAATCA







TCAACTCAGTCGTGATTCAGCCGTA







TCAGCCGTAGAAATGCTTTTCCTTT







TTATCTTGTTTGAGCTTTTCCTTTC







GAACTTGTGTTGTACTCTGTAGAAA







GTCCCAATGGGGAACCTAAATCTGT







GTTTTAATTGCACAGACACATGGAC







AAAGTCATTTTGTATCTGCCAAGTG







ATCTGCCAAGTGTGGTACCTTCCTT







XPO1 probes: (SEQ ID NO: 1413-1423)



TAGGGAGCATTTTCCTTCTAGTCTA







GCATTGTCTGAAGTTAGCACCTCTT







GCACCTCTTGGACTGAATCGTTTGT







GAATCGTTTGTCTAGACTACATGTA







GATCATGTGCATATCATCCCATTGT







ATCATCCCATTGTAAAGCGACTTCA







GTGTGTGCTGTCGCTTGTCGACAAC







GTCGCTTGTCGACAACAGCTTTTTG







ATTTGTGAGCCTTCATTAACTCGAA







GTTAGAATAGGCTGCATCTTTTTAA







ACAACTCTGGCTTTTGAGATGACTT







PTBP1 probes: (SEQ ID NO: 1424-1434)



TTCACCTGCAGTCGCCTAGAAAACT







AAACTTGCTCTCAAACTTCAGGGTT







AAGTCTCATTTCTGTGTTTTGCCTG







CCTCTGATGCTGGGACCCGGAAGGC







ATACCTGTTGTGAGACCCGAGGGGC







CGGCGCGGTTTTTTATGGTGACACA







TCCAGGCTCAGTATTGTGACCGCGG







TGCCTTACCCGATGGCTTGTGACGC







TGTTCGCTGTGGACGCTGTAGAGGC







GTTGGCCAGTCTGTACCTGGACTTC







GAATAAATCTTCTGTATCCTCAAAA







SF3B1 probes: (SEQ ID NO: 1435-1445)



GTTTACAGGGTCTGTTTCACCCAGC







TTCACCCAGCCCGGAAAGTCAGAGA







CAACTCCATCTACATTGGTTCCCAG







CTCATAGCACATTACCCAAGAATCT







GAACACCTATATTCGTTATGAACTT







TTAATGCACAGCTACTTCACACCTT







CACACCTTAAACTTGCTTTGATTTG







AATAACCTGTCTTTGTTTTTGATGT







GTAAATGCCAGTAGTGACCAAGAAC







TACACTATACTGGAGGGATTTCATT







GATTTAGAACTCATTCCTTGTGTTT







ARPC2 probes: (SEQ ID NO: 1446-1468)



ACTGGATAATCGTAGCTTTTAATGT







GTAGCTTTTAATGTTGCGCCTCTTC







GTGACAACATTGGCTACATTACCTT







TGCGCCTCTTCAGGTTCTTAAGGGA







GCTGTGCTTGCAAAGACTTCATAGT







ATCTTCCGGCATCCAAGGATTCCAT







GAGCTGAAAGACACAGACGCCGCTG







AAAGAAGGACGCAGAGCCAGCCACA







ATCTGCAGAAACGAGCTGTGCTTGC







GTCTCTTTGCTATATGACCTTGAAA







GAGGAAGCGGCTGGCAACTGAAGGC







CTCTTTTCCAAGCTGTTTCGCTTTG







CGTTTTCATCCCGCTAATCTTGGGA







GTTTCGCTTTGCAATATATTACTGG







GGAACACTTGCTACTGGATAATCGT







GAAGCGAAATTGTTTTGCCTCTGTC







GAGTCACAGTAGTCTTCAGCACAGT







GGAAGCGGCTGGCAACTGAAGGCTG







TGCAGTCATAACTTGTTTTCTCCTA







TCCTCTTTAGCCACAGGGAACCTCC







GACCTTGAAAATCTTCCGGCATCCA







GGATTCCATTGTGCATCAAGCTGGC







TTCATCCCGCTAATCTTGGGAATAA







VAMP3 probes: (SEQ ID NO: 1469-1479)



GAGACTCAACATCAGGATCCACAGC







ACAGACTTTATCGCTCTGTGGCTCA







AAGCAGCAACAGCTGAGGCGCACCA







GCTTCCATTTCTTTAACGTCTGTTC







TCTGTTCCCTTAACATCGCTGAAAT







GAAGAGATGCCTTGCGGTGTGGCCA







GACTCAGAAACCTTGGTACTCGCCC







ACTGGCTCCTGCATTAACCCAGAAA







TAACCCAGAAATACCTCGCTTCTAT







CTCGCTTCTATCTGTGCACTTAGCT







GGGAACTTACCCACTGTAATCACCT







STARD7 probes: (SEQ ID NO: 1480-1490)



TTGTGCCAAGGAAGTAGCTGCCCCA







CCTTCTCCGCGTCATTGTTGGAAGA







AGGAGAGATGCATCGAGCAGTCCCA







GCTGCTTTTCATTTATTACTTCTTC







CTTCTTCTTTCCAGGACCTGACAGA







TTATGTCCAAACTTAGCACCTGCAA







TGTGCGTCTGCGAGCGCACACACAT







AGGAGTTGCGGTTGCTCCATGTTCT







GCTCCATGTTCTGACTTAGGGCAAT







CTGCACTTGGGGTCTGTCTGTACAG







GTCTGTACAGTTACTCATGTCATTG







SEC31A probes: (SEQ ID NO: 1491-1511)



TTCAGTGAGACCTCTGCTTTCATGC







AGCATGTTTGCATAGCAACCAGTCA







GTTGCCAGTGATGATTTTCCTATTC







ATTTCTGCTGATATACTCACCTTAG







GCTGCCTTTCTTCAGCAACAGACCC







AGCAACCAGTCAAGAGCATTTACAC







TGAAGGTGCCCCAGGGGCTCCTATT







AGTATGGTTTCCTGAAGTATTCTGA







TAAAACTAAATTTCTTTCATGTCCT







TGCTCAGAACCCTGGTGCTTTATTT







GCGTACAGCAACCTCTTGGTCAAAC







TTCTCTTCCACTCAATATTGCCATT







GGACTAGTCCTCATTAGCATGTTTG







CATACCCACATAGTTAGCACCAGCA







GAGCATTTACACTATTTCTGCTGAT







AGAAGGATTGACCATGCATACCCAC







GCACCAGCAACTTCAGTGAGACCTC







TTGAGGATCTTATTCAGCGCTGCCT







GCCAGTTCTCAAAGTTGTTCTCACC







TACTCACCTTAGAACTGCTCAGAAC







GTATTTCCTGGATTACACATAGTAT







MFN2 probes: (SEQ ID NO: 1512-1532)



GTCTATGAGCGTCTGACCTGGACCA







GTTACTCCTGTATCATTGCTCATAA







AGCCTCTGTGCACTGTTTGGTGGCC







TGTATTTAAAGCCCTCAGTCTGTCC







GCCTGAATGGACAGGGGCCACTTCA







ATCACTGTCACACAATTCCAATGGA







GACCTTTGCTCATCTGTGTCAGCAA







CCCGGCGTGTGCCGGGCCTGAATGG







GCTGGAGCGCAAGACGTGCTGACAC







AGGTGATGTCCTGTTCACATACCTG







GCCTTCAAGCGCCAGTTTGTGGAGC







CCTCCATGGGCATTCTTGTTGTTGG







TCATGGTTTCCATGGTTACCGGCCT







CCCAGCCATCACTCATCTTTGAGGA







GCGAAGTGATGGACTCTGCCAGGTG







GCCACTTCACAGCATGTCAGGGAAA







GTCCTGTTGTGTGGGGCGAAGTGAT







GTGCTGACACAGTGAGTTTTCTCTG







GCAGCTTGTCATCAGCTACACTGGC







GTGTCAGCAAGTTGACGTCACCCGG







CTGCCAGGTGGACATGCTGTGGGTG







WIPI2 probes: (SEQ ID NO: 1533-1564)



CAATGAGATCTTGGACTCTGCCTCT







AGATCCCGCGGTTGTTGGTGGGTGC







TCTACTTCACTCTTCCTGTTGAAAA







CAGACTCTGCATTCCAAACCAAGGC







GACTGACTGAACTTGACCTGTGACC







AGCAACAGAGAGTAGGCGGCTGGGC







TGCCTGGACTCGCTGGAGCAAAGGA







CCGCCCATGATTCTTCGGACTGACT







GCCCCTTAGTCACTCAGACATACGG







CGCCGACGGGTACCTGTACATGTAC







GTCATGTGCCTTTCTATTTTCATCT







TAGGGGAGCTAGAAGCCACTTTCCA







GGGCTTCCTACCTGTGTGAGAGGTC







TCGCTTCCCTTTTCATATTTACAGA







ACTTGAAAGGTTGCCTGGACTCGCT







GCATGAACGTGCCAAGCCAGCATAG







CCCTGCGCCTGGATGAGGACAGCGA







CAGAACTCAAGTGTGGTGGCCGTCT







AATTGGATCGCTCTGGGATTTCTTC







GGACAGCGAGGTTCTTTCTGATACT







CCCACCAGGTGTGCTGGGCAGACTT







AAATGATCTGTTCTTCTACTTCACT







AAACAACCTCAAGTACCTCAGACTC







GGGCAGACTTCAGCTGGGACAGAAG







TTCGGGAAAGTGCTCATGGCCTCCA







CAAGCTTCAGTATTTGCCTCGCTTC







GCGTAAGGAAACCGTGGCGTCGCGC







CCACAAAAACATCTGCTCGCTAGCC







TCTGCTTGTCAAGGCCAGTTCTGCA







GCGAGTGTGCCCTGATGAAGCAGCA







GAGGTCGTAGCGGGAGACAGCAACA







TGGGACAGAAGTCCGATCTCCCTAG







PFDN1 probes: (SEQ ID NO: 1565-1575)



GGCAGTCTGCCTAAAGATTCCTTTC







GCCTTCTCCCATACATTCCAAAAGG







GTTCAACAGTAAGCAGCACCTCCAA







TCTCCTTTCGGCCAGTATCATAAGA







TGGACGCCATAATCCTGAGGCTCCT







GGCTCCTAGAGGCTGAGGGGGCAAC







TGAGGGGGCAACGGTGTGATCCAGC







GCAAGCCAGTTGTCAAACACAGCCA







GTGAGAGAGGCAGTGGCCGTCCTCC







TTCCTGTACCTTTGACTAACGCTCA







CTTCCGGGCCTGCATGCAGTAGACA







UBE3A probes: (SEQ ID NO: 1576-1621)



ATCAGCCATTTTATCGAGGCACGTG







TAGCTAATGTGCTGAGCTTGTGCCT







TAGACCACGTAACCTTCAAGTATGT







GAACTACTCTCCCAAGGAAAATATT







TAAGGAAGCGCGGGTCCCGCATGAG







GCCATCATCTTGTTGAATCAGCCAT







TACAACGGGCACAGACAGAGCACCT







TTTACTTCCGGAATACTCAAGCAAA







TATGGTGACCAATGAATCTCCCTTA







GATTGTTTTAACTGATTACTGTAGA







TTCCTAGTCTTCTGTGTATGTGATG







AGGATGTCTTTCAGGATTATTTTAA







TAATTACTTACTTATTACCTAGATT







GACTACAGGAGACGACGGGGCCTTT







GACAGAACTGTTTGTTATGTACCAT







ACTGTGCCTTGTGTTACTTAATCAT







GCGACGAACGCCGGGATTTCGGCGG







GTATAGCCCCACAGATTAAATTTAA







TTGCCACCATTTGTAGACCACGTAA







GAAGACAATGCTTTCCATATTGTGA







GCTTTAATGTGCTTTTACTTCCGGA







ATTTTTTTGCGTGAAAGTGTTACAT







CGGATAAGGAAGCGCGGGTCCCGCA







CTGGGCTCGGGGTGACTACAGGAGA







AAAGATGGCTACTGTGCCTTGTGTT







AAGGCCATCACGTATGCCAAAGGAT







GACTCTTCTTGCAGTTTACAACGGG







GAGACATTGATATATCCTTTTGCTA







GATTACTGTAGATCAACCTGATGAT







GGCTCGGGGTGACTACAGGAGACGA







GCCTCGTTTTCCGGATAAGGAAGCG







CCATTTGTAGACCACGTAACCTTCA







TTCGTGTTGCCATCATCTTGTTGAA







TTACCTACATCTCATACTTGCTTTA







GAGCTTGTGCCTTGGTGATTGATTG







CAAGGCTTTTCGGAGAGGTTTTCAT







GGGTGACTACAGGAGACGACGGGGC







TGTTACATATTCTTTCACTTGTATG







GATAAGGTAACATGGGGTTTTTCTG







GAATTACATTGTATAGCCCCACAGA







GATATATCCTTTTGCTACAAGCTAT







TGACGGTGGCTATACCAGGGACTCT







GAAACTATTACTCCTAAGAATTACA







GCTGGCGACGAACGCCGGGATTTCG







ATGCAGCTTTCAAATCATTGGGGGG







GAGGCACGTGATCAGTGTTGCAACA







GTF3C2 probes: (SEQ ID NO: 1622-1653)



GGGCAGGAGCCTCGCAATATGTGGC







GGCTCCTCAGCCTAAGACTATGGCT







AGAAACACTCAGGCCTGACCTAGGC







TAACCATCATGTATGCCCACGAGGG







TAACCATCATGTATGCCCACGAGGG







GACCCCTCTGAGTGTGGTCAGTGCC







TCCCTGTGATTGCCCTGTTAAGTAT







TCCCTGTGATTGCCCTGTTAAGTAT







TGCTCCTGCTTACGAAGTATTCCCA







TGCTCCTGCTTACGAAGTATTCCCA







GATTGCTTGTGACAACGGCTGCATC







GGCTCCTGTCTGACTATTCCAGGAT







CCCTACCGATAGAACAGTGGCTCAG







GCATGAAGGCTCCTGTCTGACTATT







TGCATCTGGGACCTCAAGTTCTGCC







CCACCAACACCTAGCTGCTGGATAT







CACAGACACCCTACCGATAGAACAG







TGGCCTGCTCAGACGGGAAAGTACT







GTATCTGCATGAAGGCTCCTGTCTG







CAAGGAATACCACAGACACCCTACC







AACCATAGCTATCATGTGTTTCCCA







ATAGCTATCATGTGTTTCCCAAATC







ACAGGGCCCACTTTGTCTATGGGAT







TTCCCAATCACTGGTCATCTGACCC







TTCCCAATCACTGGTCATCTGACCC







GGAAATCTAGTCATCTTCCCTGTGA







GGAAATCTAGTCATCTTCCCTGTGA







GACATGAATGAGACACACCCACTGA







GCAACTCTGCAGGTGGGGTCTATGC







AAGTACTGCTATTCAGTCTACCCCA







TATTACTGCCTTCTGAAACTTCCTC







TATTACTGCCTTCTGAAACTTCCTC







KHDRBS1 probes: (SEQ ID NO: 1654-1674)



GTTACTGATTTCTTGTATCTCCCAG







GCTACATGTGTAAGTCTGCCTAAAT







AATCTAGCCCCAGACATACTGTGTT







CCTCCCATTTTGTTCTCGGAAGATT







GTCCATTTGAGATTCTGCACTCCAT







CCCCTCCTGCTAGGCCAGTGAAGGG







TAATTGGATTTGTACCGTCCTCCCA







GTCAAGTATGTCTCAACACTAGCAT







GAAAAGTTCACTTGGACGCTGGGGC







TTGTCAATATATCGAACTGTTCCCA







TAACTCTGCATTCTGGCTTCTGTAT







TGTCTAAGTGTTTTTCTTCGTGGTC







AGGCCTCCTGAATTGAGTTTGATGC







GACTGGAATGGGACCAGGCCGTCGC







GATGCAGAGCTTTTTAGCCATGAAG







ATACAGAGAGCACCCATATGGACGT







GTAGATGCTTTTTTCTTTGTTGTTT







TGACTTTTTCATTACGTGGGTTTTG







GTATCTCCCAGGATTCCTGTTGCTT







TTGCTTTACCCACAACAGACAAGTA







CCTTATTCCATTCTTAACTCTGCAT







RARS probes: (SEQ ID NO: 1675-1685)



GTTGAATGACTACATCTTCTCCTTT







AGCTGCTTACTTGTTGTATGCCTTC







GTATGCCTTCACTAGAATCAGGTCT







AATCAGGTCTATTGCACGTCTGGCC







GGAAACTAGGCCGGTGCATTTTACG







GAGCTGGCAACTGCTTTCACAGAGT







GAACATGTGGCGTATCTTGTGTGAA







CTGGCCCAAGGGTGTAATCCCTCAC







AATCCCTCACAGGTTTGAACCCTGT







TTTTCCCAAGTGGCCATTGGCCCTG







GCTTTTTTTCAATCTTGTGGGCACA







MYL12A probes: (SEQ ID NO: 1686-1696)



GCAACTGGCACCATACAGGAAGATT







CAAATTCCAGCCAACGTCCTTGTTG







AACGTCCTTGTTGCACTTTGGGTAT







GCACTTTGGGTATTCTGAGATTTTC







TCTTGCCATTCCCTTAGGCTTTAGC







GGCTTTAGCAGCTTTGCATTTCCTG







TTGCATTTCCTGTTGTATTTATTCT







TATTCTCAGCCATTTTGGGCATATG







CAGACTGGAAACGGGACTTTCTATT







CTTCTCCCCCAATAACTGTGGGTCT







TCAGAGAAAGTTAGTTCGGCTCGAT







HNRNPD probes: (SEQ ID NO: 1697-1728)



GATAGTTAATGTTTTATGCTTCCAT







TATTCCATTTGCAACTTATCCCCAA







GCAAAAGTACCCCTTTGCACAGATA







GAAATGCGGCTAGTTCAGAGAGATT







AATTTTTTGTATCAAGTCCCTGAAT







GACAGGCTTGCCGAAATTGAGGACA







AACAGCCAAGGTTACGGTGGTTATG







GTGTCCTCCCTGTCCAAATTGGGAA







GATTCATTTGAAGGTGGCTCCTGCC







ATAATACTTCCTTATGTAGCCATTA







AATGTCAATTTGTTTGTTGGTTGTT







GAGTGGTTATGGGAAGGTATCCAGG







GACTACACTGGTTACAACAACTACT







GGTGGTCATCAAAATAGCTACAAAC







AAGTTTGGAAGACAGGCTTGCCGAA







GGAACCAGGGATATAGTAACTATTG







GAGCTGTGGTGGACTTCATAGATGA







AGTCCCTGAATGGAAGTATGACGTT







AAAAGCCCAGTGTGACAGTGTCATG







GAAGTTTAATTCTGAGTTCTCATTA







GGAGGATATGACTACACTGGTTACA







GAGAGATTTTTAGAGCTGTGGTGGA







TTATTCCATTTGCAACTTATCCCCA







AGGTTACGGTGGTTATGGAGGATAT







ATTTGCTTTCATTGTTTTATTTCTT







AGAAATTTGCTTTCATTGTTTTATT







CCTTTCCCCCAGTATTGTAGAGCAA







GGTATCCAGGCGAGGTGGTCATCAA







GTATGACGTTGGGTCCCTCTGAAGT







GTATGACGTTGGGTCCCTCTGAAGT







AACAACTACTATGGATATGGTGATT







TGTGCTTTTTAGAACAAATCTGGAT







TARDBP probes: (SEQ ID NO: 1729-1739)



GAGAGCGCGTGCAGAGACTTGGTGG







TGGCGAGATGTGTCTCTCAATCCTG







TCTCTCAATCCTGTGGCTTTGGTGA







GTTTTTGTTCTTAGATAACCCACAT







TGAAATGATACTTGTACTCCCCCTA







CTTTGTCAACTGCTGTGAATGCTGT







GAATGCTGTATGGTGTGTGTTCTCT







GGACTGAGCTTGTGGTGTGCTTTGC







GCAGAGTTCACCAGTGAGCTCAGGT







GTTCTAATGTCTGTTAGCTACCCAT







AAGAATGCTGTTTGCTGCAGTTCTG







HNRNPR probes: (SEQ ID NO: 1740-1760)



AAGCTAGTGCTTTGTCTTAGTAGTT







GGGGCAATCGTGGGGGCAATGTAGG







TCGTTTCAGGCTTCATTTTAGCTTC







TCACACCTTTTTGAAATCTGCCCTA







ACCCTCCAGATTACTACGGCTATGA







ATTGTTATAACTTCACACCTTTTTG







TGGATATGGCTACCCTCCAGATTAC







AAACAAGCTGGGCACACTGTTAAAT







GCTCTTGGACATTATTGGGCTTGCA







CATGATTTTGCAGAACCTTTGGTTT







CAATGCTTTTATCGTTTCAGGCTTC







GTTCCCGTGGATCTCGGGGCAATCG







GATTCCAAGCGTCGTCAGACCAACA







TCAACAGCAGAGAGGCCGTGGTTCC







GGCTATGAAGATCCCTACTACGGCT







AAAGCCGTGACAATTTGTTCTTTGA







TCACAGAGGGGGGCACCTTTGGGAC







ACCTTTGGGACCACCAAGAGGCTCT







GTATTTCCAATTTCTTGTTCATGTA







GTCGTCAGACCAACAACCAACAGAA







TGGGCTTGCAGAGTTCCCTTATTCT









Claims
  • 1. A method of evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer, the method comprising: providing a sample comprising breast tumor tissue from the patient;detecting the levels of expression of the 17 genes, or one or more corresponding alternates thereof, identified in Table 1; or of the 8 genes, or one or more corresponding alternates thereof, identified in Table 2; in the sample; andcorrelating the levels of expression with the likelihood of a relapse.
  • 2. The method of claim 1, wherein the detecting step comprises detecting the levels of expression of the 17 genes, or one or more corresponding alternates thereof, identified in Table 1.
  • 3. The method of claim 1, wherein the detecting step comprises detecting the levels of expression of the 8 genes, or one or more corresponding alternates thereof, identified in Table 2.
  • 4. The method of claim 1, further comprising detecting the level of expression of at least one reference gene identified in Table 3.
  • 5. The method of claim 1, wherein the detecting step comprises detecting the level of expression of RNA.
  • 6. The method of claim 5, wherein detecting the level of expression of RNA comprises a quantitative PCR reaction.
  • 7. The method of claim 5, wherein detecting the level of expression of RNA comprises hybridizing a nucleic acid obtained from the sample to an array that comprises probes to the 17 genes set forth in Table 1, and/or one or more corresponding alternates thereof; or hybridizing a nucleic acid obtained from the sample to an array that comprises probes to the 8 genes set forth in Table 2, and/or one or more corresponding alternates thereof.
  • 8. The method of claim 1, wherein the detecting step comprises detecting the level of protein expression.
  • 9. A kit comprising a microarray comprising probes to the 17 genes, or one or more corresponding alternates thereof, identified in Table 1; or probes to the 8 genes, or one or more corresponding alternates thereof, identified in Table 2; or comprising primers and probes for detecting expression of the 17 genes or one or more corresponding alternates thereof, identified in Table 1; or primers and probes for detecting expression of the 8 genes, or one or more corresponding alternates thereof, identified in Table 2.
  • 10. The kit of claim 9, wherein the microarray further comprises a probe to at least one reference gene identified in Table 3.
  • 11. The kit of claim 9, wherein the kit comprises primers and probes for detecting expression of the 17 genes, or one or more corresponding alternates thereof, identified in Table 1; or primers and probes for detecting expression of the 8 genes, or one or more corresponding alternates thereof, identified in Table 2.
  • 12. The kit of claim 11, further comprising primers and probes for detecting expression of at least one reference gene identified in Table 3.
  • 13. A computer-implemented method for evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer, the method comprising: receiving, at one or more computer systems, information describing the level of expression of the 17 genes, or one or more corresponding alternates thereof, identified in Table 1 in a breast tumor tissue sample obtained from the patient;performing, with one or more processors associated with the computer system, a random forest analysis in which the level of expression of each gene in the analysis is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”;generating, with the one or more processors associated with the one or more computer systems, a random forest relapse score (RFRS), wherein if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to low risk group.
  • 14. The computer-implemented method of claim 13, further comprising generating, with the one or more processors associated with the one or more computer systems, a likelihood of relapse by comparison of the RFRS score for the patient to a loess fit of RFRS versus likelihood of relapse for a training dataset.
  • 15. A non-transitory computer-readable medium storing program code for evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer in accordance with the method of claim 13, the computer-readable medium comprising: code for receiving information describing the level of expression of the 17 genes, or one or more corresponding alternates, identified in Table 1 in a breast tumor tissue sample obtained from the patient;code for performing a random forest analysis in which the level of expression of each gene in the analysis is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”; andcode for generating a random forest relapse score (RFRS), wherein if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to low risk group.
  • 16. The computer-readable medium of claim 15, further comprising code for generating a likelihood of relapse by comparison of the RFRS score for the patient to a loess fit of RFRS versus likelihood of relapse for a training dataset.
  • 17. A computer-implemented method for evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer, the method comprising: receiving, at one or more computer systems, information describing the level of expression of the 8 genes, or one or more corresponding alternates thereof, identified in Table 2 in a breast tumor tissue sample obtained from the patient;performing, with one or more processors associated with the computer system, a random forest analysis in which the level of expression of each gene in the analysis is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”;generating, with the one or more processors associated with the one or more computer systems, a random forest relapse score (RFRS), wherein if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to low risk group.
  • 18. The computer-implemented method of claim 17, further comprising generating, with the one or more processors associated with the one or more computer systems, a likelihood of relapse by comparison of the RFRS score for the patient to a loess fit of RFRS versus likelihood of relapse for a training dataset.
  • 19. A non-transitory computer-readable medium storing program code for evaluating the likelihood of a relapse for a patient that has a lymph node-negative, estrogen receptor-positive, HER2-negative breast cancer in accordance with the method of claim 17, the computer-readable medium comprising: code for receiving information describing the level of expression of the 8 genes, or one or more corresponding alternates, identified in Table 2 in a breast tumor tissue sample obtained from the patient;code for performing a random forest analysis in which the level of expression of each gene in the analysis is assigned to a terminal leaf of each decision tree, representing a vote for either “relapse” or no “relapse”; andcode for generating a random forest relapse score (RFRS), wherein if the RFRS is greater than or equal to 0.606 the patient is assigned to a high risk group, if greater than or equal to 0.333 and less than 0.606 the patient is assigned to an intermediate risk group and if less than 0.333 the patient is assigned to low risk group.
  • 20. The non-transitory computer-readable medium storing program of claim 19, further comprising code for generating a likelihood of relapse by comparison of the RFRS score for the patient to a loess fit of RFRS versus likelihood of relapse for a training dataset.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority benefit of U.S. provisional application No. 61/789,071, filed Mar. 15, 2013 and U.S. provisional application No. 61/620,907, filed Apr. 5, 2012, which applications are herein incorporated by reference.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

This invention was made with government support under Contract No. DE-ACO2-05CH11231 awarded by the U.S. Department of Energy. The government has certain rights in this invention.

Provisional Applications (2)
Number Date Country
61620907 Apr 2012 US
61789071 Mar 2013 US