Claims
- 1. A method for classifying a cell sample as ER(+) or ER(−) comprising detecting a difference in the expression by said cell sample of a first plurality of genes relative to a control, said first plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 1.
- 2. The method of claim 1, wherein said plurality consists of at least 50 of the genes corresponding to the markers listed in Table 1.
- 3. The method of claim 1, wherein said plurality consists of at least 100 of the genes corresponding to the markers listed in Table 1.
- 4. The method of claim 1, wherein said plurality consists of at least 200 of the genes corresponding to the markers listed in Table 1.
- 5. The method of claim 1, wherein said plurality consists of at least 500 of the genes corresponding to the markers listed in Table 1.
- 6. The method of claim 1, wherein said plurality consists of at least 1000 of the genes corresponding to the markers listed in Table 1.
- 7. The method of claim 1, wherein said plurality consists of each of the genes corresponding to the 2,460 markers listed in Table 2.
- 8. The method of claim 1, wherein said plurality consists of the 550 gene markers listed in Table 2.
- 9. The method of claim 1, wherein said control comprises nucleic acids derived from a pool of tumors from individual sporadic patients.
- 10. The method of claim 1, wherein said detecting comprises the steps of:
(a) generating an ER(+) template by hybridization of nucleic acids derived from a plurality of ER(+) patients within a plurality of sporadic patients against nucleic acids derived from a pool of tumors from individual sporadic patients; (b) generating an ER(−) template by hybridization of nucleic acids derived from a plurality of ER(−) patients within said plurality of sporadic patients against nucleic acids derived from said pool of tumors from individual sporadic patients within said plurality; (c) hybridizing an nucleic acids derived from an individual sample against said pool; and (d) determining the similarity of marker gene expression in the individual sample to the ER(+) template and the ER(−) template, wherein if said expression is more similar to the ER(+) template, the sample is classified as ER(+), and if said expression is more similar to the ER(−) template, the sample is classified as ER(−).
- 11. A method for classifying a cell sample as BRA4CA1-related or sporadic, comprising detecting a difference in the expression of a first plurality of genes relative to a control, said first plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 3.
- 12. The method of claim 11, wherein said plurality consists of at least 50 of the genes corresponding to the markers listed in Table 3.
- 13. The method of claim 11, wherein said plurality consists of at least 100 of the genes corresponding to the markers listed in Table 3.
- 14. The method of claim 11, wherein said plurality consists of at least 200 of the genes corresponding to the markers listed in Table 3.
- 15. The method of claim 11, wherein said plurality consists of each of the genes corresponding to the 430 markers listed in Table 3.
- 16. The method of claim 11, wherein said plurality consists of each of the genes corresponding to the 100 markers listed in Table 4.
- 17. The method of claim 11, wherein said control comprises nucleic acids derived from a pool of tumors from individual sporadic patients.
- 18. The method of claim 11, wherein said detecting comprises the steps of
(a) generating a BRCA1 template by hybridization of nucleic acids derived from a plurality of BRCA1 patients within a plurality of ER(−) patients against nucleic acids derived from a pool of tumors; (b) generating a sporadic template by hybridization of nucleic acids derived from a plurality of sporadic patients within said plurality of ER(−) patients against nucleic acids derived from said pool of tumors; (c) hybridizing nucleic acids derived from an individual sample against said pool; and (d) determining the similarity of marker gene expression in the individual sample to the BRCA1 template and the sporadic template, wherein if said expression is more similar to the BRCA1 template, the sample is classified as BRCA1, and if said expression is more similar to the sporadic template, the sample is classified as sporadic.
- 19. A method for classifying an individual as having a good prognosis (no distant metastases within five years of initial diagnosis) or a poor prognosis (distant metastases within five years of initial diagnosis), comprising detecting a difference in the expression of a first plurality of genes in a cell sample taken from the individual relative to a control, said first plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 5.
- 20. The method of claim 19, wherein said plurality consists of at least 20 of the genes corresponding to the markers listed in Table 5.
- 21. The method of claim 19, wherein said plurality consists of at least 100 of the genes corresponding to the markers listed in Table 5.
- 22. The method of claim 19, wherein said plurality consists of at least 150 of the genes corresponding to the markers listed in Table 5.
- 23. The method of claim 19, wherein said plurality consists of each of the genes corresponding to the 231 markers listed in Table 5.
- 24. The method of claim 19, wherein said plurality consists of the 70 gene markers listed in Table 6.
- 25. The method of claim 1, wherein said control comprises nucleic acids derived from a pool of tumors from individual sporadic patients.
- 26. The method of claim 19, wherein said detecting comprises the steps of:
(a) generating a good prognosis template by hybridization of nucleic acids derived from a plurality of good prognosis patients against nucleic acids derived from a pool of tumors from individual patients; (b) generating a poor prognosis template by hybridization of nucleic acids derived from a plurality of poor prognosis patients against nucleic acids derived from said pool of tumors from said plurality of individual patients; (c) hybridizing an nucleic acids derived from and individual sample against said pool; and (d) determining the similarity of marker gene expression in the individual sample to the good prognosis template and the poor prognosis template, wherein if said expression is more similar to the good prognosis template, the sample is classified as having a good prognosis, and if said expression is more similar to the poor prognosis template, the sample is classified as having a poor prognosis.
- 27. The method of claim 1, wherein the cell sample is additionally classified as BRCA1-related or sporadic by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 3 or Table 4.
- 28. The method of claim 1, wherein the cell sample is additionally classified as taken from a patient with a good prognosis or a poor prognosis by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 5.
- 29. The method of claim 11, wherein the cell sample is additionally classified as taken from a patient with a good prognosis or a poor prognosis by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 20 of the genes corresponding to the markers listed in Table 5.
- 30. The method of claim 11, wherein the cell sample is additionally classified as ER(+) or ER(−) by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 1.
- 31. The method of claim 19, wherein the cell sample is additionally classified as ER(+) or ER(−) by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 1.
- 32. The method of claim 19, wherein the cell sample is additionally classified as BRCA1 or sporadic by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 3.
- 33. A method for classifying a sample as ER(+) or ER(−) by calculating the similarity between the expression of at least 5 of the markers listed in Table 1 in the sample to the expression of the same markers in an ER(−) nucleic acid pool and an ER(+) nucleic acid pool, comprising the steps of:
(a) labeling nucleic acids derived from a sample, with a first fluorophore to obtain a first pool of fluorophore-labeled nucleic acids; (b) labeling with a second fluorophore a first pool of nucleic acids derived from two or more ER(+) samples, and a second pool of nucleic acids derived from two or more ER(−) samples: (c) contacting said first fluorophore-labeled nucleic acid and said first pool of second fluorophore-labeled nucleic acid with a first microarray under conditions such that hybridization can occur, and contacting said first fluorophore-labeled nucleic acid and said second pool of second fluorophore-labeled nucleic acid with a second microarray under conditions such that hybridization can occur, wherein said first microarray and said second microarray are similar to each other, exact replicas of each other, or are identical, detecting at each of a plurality of discrete loci on the first microarray a first flourescent emission signal from said first fluorophore-labeled nucleic acid and a second fluorescent emission signal from said first pool of second fluorophore-labeled genetic matter that is bound to said first microarray under said conditions, and detecting at each of the marker loci on said second microarray said first fluorescent emission signal from said first fluorophore-labeled nucleic acid and a third fluorescent emission signal from said second pool of second fluorophore-labeled nucleic acid; (d) determining the similarity of the sample to the ER(−) and ER(+) pools by comparing said first fluorescence emission signals and said second fluorescence emission signals, and said first emission signals and said third fluorescence emission signals; and (e) classifying the sample as ER(+) where the first fluorescence emission signals are more similar to said second fluorescence emission signals than to said third fluorescent emission signals, and classifying the sample as ER(−) where the first fluorescence emission signals are more similar to said third fluorescence emission signals than to said second fluorescent emission signals.
- 34. The method of claim 33, wherein said similarity is calculated by determining a first sum of the differences of expression levels for each marker between said first fluorophore-labeled nucleic acid and said first pool of second fluorophore-labeled nucleic acid, and a second sum of the differences of expression levels for each marker between said first fluorophore-labeled nucleic acid and said second pool of second fluorophore-labeled nucleic acid, wherein if said first sum is greater than said second sum, the sample is classified as ER(−), and if said second sum is greater than said first sum, the sample is classified as ER(+).
- 35. The method of claim 33, wherein said similarity is calculated by computing a first classifier parameter P1 between an ER(+) template and the expression of said markers in said sample, and a second classifier parameter P2 between an ER(−) template and the expression of said markers in said sample, wherein said P1 and P2 are calculated according to the formula:
- 36. A method for determining a set of marker genes whose expression is associated with a particular phenotype, comprising the steps of:
(a) selecting phenotype having two or more phenotype categories; (b) identifying a plurality of genes wherein the expression of said genes is correlated or anticorrelated with one of the phenotype categories, and wherein the correlation coefficient for each gene is calculated according to the equation ρ=({right arrow over (c)}•{right arrow over (r)})/(∥{right arrow over (c)}∥·{right arrow over (r)}∥), wherein C is a number representing said phenotype category and {right arrow over (r)} is the logarithmic expression ratio across all the samples for each individual gene, wherein if the correlation coefficient has an absolute value of 0.3 or greater, said expression of said gene is associated with the phenotype category, wherein said plurality of genes is a set of marker genes whose expression is associated with a particular phenotype.
- 37. The method of claim 36, wherein said set of marker genes is validated by:
(a) using a statistical method to randomize the association between said marker genes and said phenotype category, thereby creating a control correlation coefficient for each marker gene; (b) repeating step (a) one hundred or more times to develop a frequency distribution of said control correlation coefficients for each marker gene; (c) determining the number of marker genes having a control correlation coefficient of 0.3 or above, thereby creating a control marker gene set; and (d) comparing the number of control marker genes so identified to the number of marker genes, wherein if the p value of the difference between the number of marker genes and the number of control genes is less than a threshold, said set of marker genes is validated.
- 38. The method of claim 36, wherein said set of marker genes is optimized by the method comprising:
(a) rank-ordering the genes by amplitude of correlation or by significance of the correlation coefficients to create a rank-ordered list, and (b) selecting an arbitrary number n of marker genes from the top of the rank-ordered list.
- 39. The method of claim 38, wherein said set of marker genes is further optimized by the method comprising:
(a) calculating an error rate for said arbitrary number n of marker genes; (b) increasing by 1 the number of genes selected from the top of the rank-ordered list; (c) calculating an error rate for said number of genes selected from the top of the rank-ordered list; (d) repeating steps (b) and (c) until said number of genes selected from the top of the rank-ordered list includes all genes included in said rank ordered list, and (e) identifying said number of genes selected from the top of the rank-ordered list for which the error rate is smallest, wherein said set of marker genes is optimized when the error rate is the smallest.
- 40. A method for assigning a person to one of a plurality of categories in a clinical trial, comprising determining for each said person the level of expression of at least five of the prognosis markers listed in Table 6, determining therefrom whether the person has an expression pattern that correlates with a good prognosis or a poor prognosis, and assigning said person to one category in a clinical trial if said person is determined to have a good prognosis, and a different category if that person is determined to have a poor prognosis.
- 41. A method of classifying a first cell or organism as having one of at least two different phenotypes, said at least two different phenotypes comprising a first phenotype and a second phenotype, said method comprising:
(a) comparing the level of expression of each of a plurality of genes in a first sample from the first cell or organism to the level of expression of each of said genes, respectively, in a pooled sample from a plurality of cells or organisms, said plurality of cells or organisms comprising different cells or organisms exhibiting said at least two different phenotypes, respectively, to produce a first compared value; (b) comparing said first compared value to a second compared value, wherein said second compared value is the product of a method comprising comparing the level of expression of each of said genes in a sample from a cell or organism characterized as having said first phenotype to the level of expression of each of said genes, respectively, in said pooled sample; (c) comparing said first compared value to a third compared value, wherein said third compared value is the product of a method comprising comparing the level of expression of each of said genes in a sample from a cell or organism characterized as having said second phenotype to the level of expression of each of said genes, respectively, in said pooled sample, (d) optionally carrying out one or more times a step of comparing said first compared value to one or more additional compared values, respectively, each additional compared value being the product of a method comprising comparing the level of expression of each of said genes in a sample from a cell or organism characterized as having a phenotype different from said first and second phenotypes but included among said at least two different phenotypes, to the level of expression of each of said genes, respectively, in said pooled sample; and (e) determining to which of said second, third and, if present, one or more additional compared values, said first compared value is most similar; wherein said first cell or organism is determined to have the phenotype of the cell or organism used to produce said compared value most similar to said first compared value.
- 42. The method of claim 40, wherein said compared values are each ratios of the levels of expression of each of said genes.
- 43. The method of claim 40, wherein each of said levels of expression of each of said genes in said pooled sample are normalized prior to any of said comparing steps.
- 44. The method of claim 42 wherein normalizing said levels of expression is carried out by dividing each of said levels of expression by the median or mean level of expression of each of said genes or dividing by the mean or median level of expression of one or more housekeeping genes in said pooled sample.
- 45. The method of claim 42 wherein said normalized levels of expression are subjected to a log transform and said comparing steps comprise subtracting said log transform from the log of said levels of expression of each of said genes in said sample from said cell or organism.
- 46. The method of claim 40, wherein said at least two different phenotypes are different stages of a disease or disorder.
- 47. The method of claim 40, wherein said at least two different phenotypes are different prognoses of a disease or disorder.
- 48. The method of claim 40, wherein said levels of expression of each of said genes, respectively, in said pooled sample or said levels of expression of each of said genes in a sample from said cell or organism characterized as having said first phenotype, said second phenotype, or said phenotype different from said first and second phenotypes, respectively, are stored on a computer.
- 49. A microarray comprising at least 5 markers derived from any one of Tables 1-6, wherein at least 50% of the probes on the microarray are present in any one of Tables 1-6.
- 50. The microarray of claim 48, wherein at least 70% of the probes on the microarray are present in any one of Tables 1-6.
- 51. The microarray of claim 48, wherein at least 80% of the probes on the microarray are present in any one of Tables 1-6.
- 52. The microarray of claim 48, wherein at least 90% of the probes on the microarray are present in any one of Tables 1-6.
- 53. The microarray of claim 48, wherein at least 95% of the probes on the microarray are present in any one of Tables 1-6.
- 54. The microarray of claim 48, wherein at least 98% of the probes on the microarray are present in any one of Tables 1-6.
- 55. A microarray for distinguishing ER(+) and ER(−) cell samples comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a different gene, said plurality consisting of at least 20 of the genes corresponding to the markers listed in Table 1 or Table 2, wherein at least 50% of the probes on the microarray are present in Table 1 or Table 2.
- 56. A microarray for distinguishing BRCA1-related and sporadic cell samples comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a different gene, said plurality consisting of at least 20 of the genes corresponding to the markers listed in Table 3 or Table 4, wherein at least 50% of the probes on the microarray are present in Table 3 or Table 4.
- 57. A microarray for distinguishing cell samples from individuals having a good prognosis and cell samples from individuals having a poor prognosis, comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a different, said plurality consisting of at least 20 of the genes corresponding to the markers listed in Table 5 or Table 6, wherein at least 50% of the probes on the microarray are present in Table 5 or Table 6.
- 58. A kit for determining whether a sample contains a BRCA1 or sporadic mutation, comprising at least one microarray comprising probes to at least 20 of the genes corresponding to the markers listed in Table 3, and a computer readable medium having recorded thereon one or more programs for determining the similarity of the level of nucleic acid derived from the markers listed in Table 3 in a sample to that in a BRCA1 pool and a sporadic tumor pool, wherein the one or more programs cause a computer to perform a method comprising computing the aggregate differences in expression of each marker between the sample and BRCA1 and the aggregate differences in expression of each marker between the sample and sporadic pool, or a method comprising determining the correlation of expression of the markers in the sample to the expression in the BRCA1 and sporadic pools, said correlation calculated according to Equation (3).
- 59. A kit for determining the ER-status of a sample, comprising at least one microarray comprising probes to at least 20 of the genes corresponding to the markers listed in Table 1, and a computer readable medium having recorded thereon one or more programs for determining the similarity of the level of nucleic acid derived from the markers listed in Table 1 in a sample to that in an ER(−) pool and an ER(+) pool, wherein the one or more programs cause a computer to perform a method comprising computing the aggregate differences in expression of each marker between the sample and ER(−) pool and the aggregate differences in expression of each marker between the sample and ER(+) pool, or a method comprising determining the correlation of expression of the markers in the sample to the expression in the ER(−) and ER(+) pools, said correlation calculated according to Equation (3).
- 60. A kit for determining whether a sample is derived from a patient having a good prognosis or a poor prognosis, comprising at least one microarray comprising probes to at least 20 of the genes corresponding to the markers listed in Table 5, and a computer readable medium having recorded thereon one or more programs for determining the similarity of the level of nucleic acid derived from the markers listed in Table 5 in a sample to that in a pool of samples derived from individuals having a good prognosis and a pool of samples derived from individuals having a good prognosis, wherein the one or more programs cause a computer to perform a method comprising computing the aggregate differences in expression of each marker between the sample and the good prognosis pool and the aggregate differences in expression of each marker between the sample and the poor prognosis pool, or a method comprising determining the correlation of expression of the markers in the sample to the expression in the good prognosis and poor prognosis pools, said correlation calculated according to Equation (3).
- 61. A method for classifying a breast cancer patient according to prognosis, comprising:
(a) comparing the respective levels of expression of at least five genes for which markers are listed in Table 5 in a cell sample taken from said breast cancer patient to respective control levels of expression of said at least five genes; and (b) classifying said breast cancer patient according to prognosis of his or her breast cancer based on the similarity between said levels of expression in said cell sample and said control levels.
- 62. The method according to claim 61, wherein step (b) comprises determining whether said similarity exceeds one or more predetermined threshold values of similarity.
- 63. A method for classifying a breast cancer patient according to prognosis, comprising:
(a) determining the similarity between the level of expression of each of at least five genes for which markers are listed in Table 5 in a cell sample taken from said breast cancer patient, to control levels of expression for each respective said at least five genes to obtain a patient similarity value; (b) providing selected first and second threshold values of similarity of said level of expression of each of said at least five genes to said control levels of expression to obtain first and second similarity threshold values, respectively, wherein said second similarity threshold indicates greater similarity to said control than does said first similarity threshold; and (c) classifying said breast cancer patient as having a first prognosis if said patient similarity value exceeds said first and said second similarity threshold values, a second prognosis if said level of expression of said genes exceeds said first similarity threshold value but does not exceed said second similarity threshold value, and a third prognosis if said level of expression of said genes does not exceed said first similarity threshold value or said second similarity threshold value.
- 64. The method of claim 63, further comprising determining prior to step (a) said level of expression of said at least five genes.
- 65. The method of claim 61, wherein said control levels are the mean levels of expression of each of said at least five genes in a pool of tumor samples obtained from a plurality of breast cancer patients who have no distant metastases within five years of initial diagnosis.
- 66. The method of claim 61, wherein said control levels comprise the expression levels of said genes in breast cancer patients who have had no distant metastases within five years of initial diagnosis.
- 67. The method of claim 61, wherein said control levels comprise, for each of said at least five genes, mean log intensity values stored on a computer.
- 68. The method of claim 61, wherein said control levels comprise, for each of said at least five genes, the mean log intensity values that are listed in Table 7.
- 69. The method of claim 63, wherein said determining in step (a) is carried out by a method comprising determining the degree of similarity between the level of expression of each of said at least five genes in a sample taken from said breast cancer patient to the level of expression of each of said at least five genes in a plurality of breast cancer patients who have had no relapse of breast cancer within five years of initial diagnosis.
- 70. The method of claim 63, wherein said determining in step (a) is carried out by a method comprising determining the difference between the absolute expression level of each of said at least five genes and the average expression level of each of said at least five genes in a pool of tumor samples obtained from a plurality of breast cancer patients who have had no relapse of breast cancer within five years of initial diagnosis.
- 71. The method of claim 63, wherein said first threshold value and said second threshold value are coefficients of correlation to the mean expression level of each of said at least five genes in a pool of tumor samples obtained from a plurality of breast cancer patients who have had no relapse of breast cancer within five years of initial diagnosis.
- 72. The method of claim 71, wherein said first threshold similarity value and said second threshold similarity values are selected by a method comprising:
(a) rank ordering in descending order said tumor samples that compose said pool of tumor samples by the degree of similarity between the level of expression of each said at least five genes in each of said tumor samples to the mean level of expression of said at least five genes of the remaining tumor samples that compose said pool to obtain a rank-ordered list, said degree of similarity being expressed as a similarity value; (b) determining an acceptable number of false negatives in said classifying step, wherein a false negative is a breast cancer patient for whom the expression levels of said at least five genes in said cell sample predicts that said breast cancer patient will have no distant metastases within the first five years after initial diagnosis, but who has had a distant metastasis within the first five years after initial diagnosis; (c) determining a similarity value above which in said rank ordered list fewer than said acceptable number of tumor samples are false negatives; (d) selecting said similarity value determined in step (c) as said first threshold similarity value; and (e) selecting a second similarity value, greater than said first similarity value, as said second threshold similarity value.
- 73. The method of claim 72, wherein said second threshold similarity value is selected in step (e) by a method comprising determining which of said tumor samples, taken from said breast cancer patients having a distant metastasis within the first five years after initial diagnosis, in said rank ordered list has the greatest similarity value, and selecting said greatest similarity value as said second threshold similarity value.
- 74. The method of claim 72, wherein said first and second threshold similarity values are correlation coefficients, and said first threshold similarity value is 0.4 and said second threshold similarity value is greater than 0.4.
- 75. The method of claim 72, wherein said first and second threshold similarity values are correlation coefficients, and said second threshold similarity value is 0.636.
- 76. The method of claim 61, wherein said comparing step (a) comprises comparing the respective levels of expression of at least ten of said genes for which markers are listed in Table 5 in said cell sample to said respective control levels of said at least ten of said genes, wherein said control levels of expression of said at least ten genes are the average expression levels of each of said at least ten genes in a pool of tumor samples obtained from breast cancer patients who have had no distant metastases within five years of initial diagnosis.
- 77. The method of claim 61, wherein said comparing step (a) comprises comparing the respective levels of expression of at least 25 of said genes for which markers are listed in Table 5 in said cell sample to said respective control levels of expression of said at least 25 genes, wherein said control levels of expression of said at least 25 genes are the average expression levels of each of said at least 25 genes in a pool of tumor samples obtained from breast cancer patients who have had no distant metastases within five years of initial diagnosis.
- 78. The method of claim 61, wherein said comparing step (a) comprises comparing the respective levels of expression of each of said genes for which markers are listed in Table 6 in said cell sample to said respective control levels of expression of each of said genes for which markers are listed in Table 6, wherein said control levels of expression of each of said genes for which markers are listed in Table 6 are the average expression levels of each of said genes in a pool of tumor samples obtained from breast cancer patients who have had no distant metastases within five years of initial diagnosis.
- 79. A method of assigning a therapeutic regimen to a breast cancer patient, comprising:
(a) classifying said patient as having a “poor prognosis,” “intermediate prognosis,” or “very good prognosis” on the basis of the levels of expression of at least five genes for which markers are listed in Table 5; and (b) assigning said patient a therapeutic regimen, said therapeutic regimen (i) comprising no adjuvant chemotherapy if the patient is lymph node negative and is classified as having a good prognosis or an intermediate prognosis, or (ii) comprising chemotherapy if said patient has any other combination of lymph node status and expression profile.
- 80. A method of assigning a therapeutic regimen to a breast cancer patient, comprising:
(a) determining the lymph node status for said patient; (b) determining the level of expression of at least five genes for which markers are listed in Table 5 in a cell sample from said patient, thereby generating an expression profile; (c) classifying said patient as having a “poor prognosis,” “intermediate prognosis,” or “very good prognosis” on the basis of said expression profile; and (d) assigning said patient a therapeutic regimen, said therapeutic regimen comprising no adjuvant chemotherapy if the patient is lymph node negative and is classified as having a good prognosis or an intermediate prognosis, or comprising chemotherapy if said patient has any other combination of lymph node status and classification.
- 81. The method of claim 80 in which said therapeutic regimen assigned to lymph node negative patients classified as having an “intermediate prognosis” additionally comprises adjuvant hormonal therapy.
- 82. The method of claim 80, wherein said classifying step (c) is carried out by a method comprising:
(a) rank ordering in descending order a plurality of breast cancer tumor samples that compose a pool of breast cancer tumor samples by the degree of similarity between the level of expression of said at least five genes in each of said tumor samples and the level of expression of said at least five genes across all remaining tumor samples that compose said pool, said degree of similarity being expressed as a similarity value; (b) determining an acceptable number of false negatives in said classifying step, wherein a false negative is a breast cancer patient for whom the expression levels of said at least five genes in said cell sample predicts that said breast cancer patient will have no distant metastases within the first five years after initial diagnosis, but who has had a distant metastasis within the first five years after initial diagnosis; (c) determining a similarity value above which in said rank ordered list said acceptable number of tumor samples or fewer are false negatives; (d) selecting said similarity value determined in step (c) as a first threshold similarity value; (e) selecting a second similarity value, greater than said first similarity value, as a second threshold similarity value; and (f) determining the similarity between the level of expression of each of said at least five genes in a breast cancer tumor sample from the breast cancer patient and the level of expression of each of said respective at least five genes in said pool, to obtain a patient similarity value, wherein if said patient similarity value equals or exceeds said second threshold similarity value, said patient is classified as having a “very good prognosis”; if said patient similarity value equals or exceeds said first threshold similarity value, but is less than said second threshold similarity value, said patient is classified as having an “intermediate prognosis”; and if said patient similarity value is less than said first threshold similarity value, said patient is classified as having a “poor prognosis.”
- 83. The method of claim 80 which further comprises determining the estrogen receptor (ER) status of said patient, wherein if said patient is ER positive and lymph node negative, said therapeutic regimen assigned to said patient additionally comprises adjuvant hormonal therapy.
- 84. The method of claim 80, wherein said patient is 52 years of age or younger.
- 85. The method of claim 80 or 84, wherein said patient has stage I or stage II breast cancer.
- 86. The method of claim 80, wherein said patient is premenopausal.
- 87. A method of classifying a breast cancer patient according to prognosis comprising the steps of:
(a) contacting first nucleic acids derived from a tumor sample taken from said breast cancer patient, and second nucleic acids derived from two or more tumor samples from breast cancer patients who have had no distant metastases within five years of initial diagnosis, with an array under conditions such that hybridization can occur, said array comprising a positionally-addressable ordered array of polynucleotide probes bound to a solid support, said polynucleotide probes being complementary and hybridizable to at least five of the genes respectively for which markers are listed in Table 5, or the RNA encoded by said genes, and wherein at least 50% of the probes on said array are hybridizable to genes respectively for which markers are listed in Table 5, or to the RNA encoded by said genes; (b) detecting at each of a plurality of discrete loci on said array a first fluorescent emission signal from said first nucleic acids and a second fluorescent emission signal from said second nucleic acids that are bound to said array under said conditions; (c) calculating the similarity between said first fluorescent emission signals and said second fluorescent emission signals across said at least five genes respectively for which markers are listed in Table 5; and (d) classifying said breast cancer patient according to prognosis of his or her breast cancer based on the similarity between said first fluorescent emission signals and said second fluorescent emission signals across said at least five genes respectively for which markers are listed in Table 5.
- 88. A computer program product for classifying a breast cancer patient according to prognosis, the computer program product for use in conjunction with a computer having a memory and a processor, the computer program product comprising a computer readable storage medium having a computer program encoded thereon, wherein said computer program product can be loaded into the one or more memory units of a computer and causes the one or more processor units of the computer to execute the steps of:
(a) receiving a first data structure comprising the respective levels of expression of each of at least five genes for which markers are listed in Table 5 in a cell sample taken from said patient; (b) determining the similarity of the level of expression of each of said at least five genes to respective control levels of expression of said at least five genes to obtain a patient similarity value; (c) comparing said patient similarity value to selected first and second threshold values of similarity of said respective levels of expression of each of said at least five genes to said respective control levels of expression of said at least five genes, wherein said second threshold value of similarity indicates greater similarity to said respective control levels of expression of said at least five genes than does said first threshold value of similarity; and (d) classifying said patient as having a first prognosis if said patient similarity value exceeds said first and said second threshold similarity values; a second prognosis if said patient similarity value exceeds said first threshold similarity value but does not exceed said second threshold similarity value; and a third prognosis if said patient similarity value does not exceed said first threshold similarity value or said second threshold similarity value.
- 89. The computer program product of claim 88, wherein said first threshold value of similarity and said second threshold value of similarity are values stored in said computer.
- 90. The computer program product of claim 88, wherein said respective control levels of expression of said at least five genes is stored in said computer.
- 91. The computer program product of claim 88 wherein said first prognosis is a “very good prognosis”; said second prognosis is an “intermediate prognosis”; and said third prognosis is a “poor prognosis”; wherein said computer program may be loaded into the memory and further cause said one or more processor units of said computer to execute the step of assigning said breast cancer patient a therapeutic regimen comprising no adjuvant chemotherapy if the patient is lymph node negative and is classified as having a good prognosis or an intermediate prognosis, or comprising chemotherapy if said patient has any other combination of lymph node status and expression profile.
- 92. The computer program product of claim 91 wherein said clinical data includes the lymph node and estrogen receptor (ER) status of said breast cancer patient.
- 93. The computer program product of claim 88 wherein said computer program may be loaded into the memory and further causes said one or more processor units of the computer to execute the steps of receiving a data structure comprising clinical data specific to said breast cancer patient.
- 94. The computer program product of claim 88 wherein said respective control levels of expression of said at least five genes comprises a set of single-channel mean hybridization intensity values for each of said at least five genes, stored on said computer readable storage medium.
- 95. The computer program product of claim 93 wherein said single-channel mean hybridization intensity values are log transformed.
- 96. The computer program product of claim 88 wherein said computer program product causes said processing unit to perform said comparing step (c) by calculating the difference between the level of expression of each of said atleast five genes in said cell sample taken from said breast cancer patient and said respective control levels of expression of said at least five genes.
- 97. The computer program product of claim 88 wherein said computer program product causes said processing unit to perform said comparing step (c) by calculating the mean log level of expression of each of said at least five genes in said control to obtain a control mean log expression level for each gene, calculating the log expression level for each of said at least five genes in a breast cancer sample from said patient to obtain a patient log expression level, and calculating the difference between the patient log expression level and the control mean log expression for each of said at least five genes.
- 98. The computer program product of claim 88 wherein said computer program product causes said processing unit to perform said comparing step (c) by calculating similarity between the level of expression of each of said at least five genes in said cell sample taken from said patient and said respective control levels of expression of said at least five genes, wherein said similarity is expressed as a similarity value.
- 99. The computer program product of claim 98 wherein said similarity value is a correlation coefficient.
Parent Case Info
[0001] This application is a continuation-in-part of U.S. application Ser. No. 10/172,118, filed Jun. 14, 2002, which in turn claims benefit of both U.S. Provisional Application No. 60/298,918 filed Jun. 18, 2001 and U.S. Provisional Application No. 60/380,710 filed May 14, 2002, each of which is incorporated by reference herein in its entirety.
[0002] This application includes a Sequence Listing submitted on compact disc, recorded on two compact discs, including one duplicate, containing Filename 9301188999.txt, of size 6,634,550 bytes, created Jan. 13, 2003. The sequence listing on the compact discs is incorporated by reference herein in its entirety.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60298918 |
Jun 2001 |
US |
|
60380710 |
May 2002 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
10172118 |
Jun 2002 |
US |
Child |
10342887 |
Jan 2003 |
US |