Claims
- 1. A method for classifying a cell sample as ER(+) or ER(−) comprising detecting a difference in the expression by said cell sample of a first plurality of genes relative to a control, said first plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 1.
- 2. The method of claim 1, wherein said plurality consists of at least 50 of the genes corresponding to the markers listed in Table 1.
- 3. The method of claim 1, wherein said plurality consists of at least 100 of the genes corresponding to the markers listed in Table 1.
- 4. The method of claim 1, wherein said plurality consists of at least 200 of the genes corresponding to the markers listed in Table 1.
- 5. The method of claim 1, wherein said plurality consists of at least 500 of the genes corresponding to the markers listed in Table 1.
- 6. The method of claim 1, wherein said plurality consists of at least 1000 of the genes corresponding to the markers listed in Table 1.
- 7. The method of claim 1, wherein said plurality consists of each of the genes corresponding to the 2,460 markers listed in Table 2.
- 8. The method of claim 1, wherein said plurality consists of the 550 gene markers listed in Table 2.
- 9. The method of claim 1, wherein said control comprises nucleic acids derived from a pool of tumors from individual sporadic patients.
- 10. The method of claim 1, wherein said detecting comprises the steps of
(a) generating an ER(+) template by hybridization of nucleic acids derived from a plurality of ER(+) patients within a plurality of sporadic patients against nucleic acids derived from a pool of tumors from individual sporadic patients; (b) generating an ER(−) template by hybridization of nucleic acids derived from a plurality of ER(−) patients within said plurality of sporadic patients against nucleic acids derived from said pool of tumors from individual sporadic patients within said plurality; (c) hybridizing an nucleic acids derived from an individual sample against said pool; and (d) determining the similarity of marker gene expression in the individual sample to the ER(+) template and the ER(−) template, wherein if said expression is more similar to the ER(+) template, the sample is classified as ER(+), and if said expression is more similar to the ER(−) template, the sample is classified as ER(−).
- 11. A method for classifying a cell sample as BRACA1-related or sporadic, comprising detecting a difference in the expression of a first plurality of genes relative to a control, said first plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 3.
- 12. The method of claim 11, wherein said plurality consists of at least 50 of the genes corresponding to the markers listed in Table 3.
- 13. The method of claim 11, wherein said plurality consists of at least 100 of the genes corresponding to the markers listed in Table 3.
- 14. The method of claim 11, wherein said plurality consists of at least 200 of the genes corresponding to the markers listed in Table 3.
- 15. The method of claim 11, wherein said plurality consists of each of the genes corresponding to the 430 markers listed in Table 3.
- 16. The method of claim 11, wherein said plurality consists of each of the genes corresponding to the 100 markers listed in Table 4.
- 17. The method of claim 11, wherein said control comprises nucleic acids derived from a pool of tumors from individual sporadic patients.
- 18. The method of claim 11, wherein said detecting comprises the steps of
(a) generating a BRCA1 template by hybridization of nucleic acids derived from a plurality of BRCA1 patients within a plurality of ER(−) patients against nucleic acids derived from a pool of tumors; (b) generating a sporadic template by hybridization of nucleic acids derived from a plurality of sporadic patients within said plurality of ER(−) patients against nucleic acids derived from said pool of tumors; (c) hybridizing nucleic acids derived from an individual sample against said pool; and (d) determining the similarity of marker gene expression in the individual sample to the BRCA1 template and the sporadic template, wherein if said expression is more similar to the BRCA1 template, the sample is classified as BRCA1, and if said expression is more similar to the sporadic template, the sample is classified as sporadic.
- 19. A method for classifying an individual as having a good prognosis (no distant metastases within five years of initial diagnosis) or a poor prognosis (distant metastases within five years of initial diagnosis), comprising detecting a difference in the expression of a first plurality of genes in a cell sample taken from the individual relative to a control, said first plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 5.
- 20. The method of claim 19, wherein said plurality consists of at least 20 of the genes corresponding to the markers listed in Table 5.
- 21. The method of claim 19, wherein said plurality consists of at least 100 of the genes corresponding to the markers listed in Table 5.
- 22. The method of claim 19, wherein said plurality consists of at least 150 of the genes corresponding to the markers listed in Table 5.
- 23. The method of claim 19, wherein said plurality consists of each of the genes corresponding to the 231 markers listed in Table 5.
- 24. The method of claim 19, wherein said plurality consists of the 70 gene markers listed in Table 6.
- 25. The method of claim 1, wherein said control comprises nucleic acids derived from a pool of tumors from individual sporadic patients.
- 26. The method of claim 19, wherein said detecting comprises the steps of:
(a) generating a good prognosis template by hybridization of nucleic acids derived from a plurality of good prognosis patients against nucleic acids derived from a pool of tumors from individual patients; (b) generating a poor prognosis template by hybridization of nucleic acids derived from a plurality of poor prognosis patients against nucleic acids derived from said pool of tumors from said plurality of individual patients; (c) hybridizing an nucleic acids derived from and individual sample against said pool; and (d) determining the similarity of marker gene expression in the individual sample to the good prognosis template and the poor prognosis template, wherein if said expression is more similar to the good prognosis template, the sample is classified as having a good prognosis, and if said expression is more similar to the poor prognosis template, the sample is classified as having a poor prognosis.
- 27. The method of claim 1, wherein the cell sample is additionally classified as BRCA1-related or sporadic by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 3 or Table 4.
- 28. The method of claim 1, wherein the cell sample is additionally classified as taken from a patient with a good prognosis or a poor prognosis by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 5.
- 29. The method of claim 11, wherein the cell sample is additionally classified as taken from a patient with a good prognosis or a poor prognosis by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 20 of the genes corresponding to the markers listed in Table 5.
- 30. The method of claim 11, wherein the cell sample is additionally classified as ER(+) or ER(−) by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 1.
- 31. The method of claim 19, wherein the cell sample is additionally classified as ER(+) or ER(−) by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 1.
- 32. The method of claim 19, wherein the cell sample is additionally classified as BRCA1 or sporadic by detecting a difference in the expression of a second plurality of genes in a cell sample taken from the individual relative to a control, said second plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in Table 3.
- 33. A method for classifying a sample as ER(+) or ER(−) by calculating the similarity between the expression of at least 5 of the markers listed in Table 1 in the sample to the expression of the same markers in an ER(−) nucleic acid pool and an ER(+) nucleic acid pool, comprising the steps of:
(a) labeling nucleic acids derived from a sample, with a first fluorophore to obtain a first pool of fluorophore-labeled nucleic acids; (b) labeling with a second fluorophore a first pool of nucleic acids derived from two or more ER(+) samples, and a second pool of nucleic acids derived from two or more ER(−) samples:
(c) contacting said first fluorophore-labeled nucleic acid and said first pool of second fluorophore-labeled nucleic acid with a first microarray under conditions such that hybridization can occur, and contacting said first fluorophore-labeled nucleic acid and said second pool of second fluorophore-labeled nucleic acid with a second microarray under conditions such that hybridization can occur, wherein said first microarray and said second microarray are similar to each other, exact replicas of each other, or are identical, detecting at each of a plurality of discrete loci on the first microarray a first flourescent emission signal from said first fluorophore-labeled nucleic acid and a second fluorescent emission signal from said first pool of second fluorophore-labeled genetic matter that is bound to said first microarray under said conditions, and detecting at each of the marker loci on said second microarray said first fluorescent emission signal from said first fluorophore-labeled nucleic acid and a third fluorescent emission signal from said second pool of second fluorophore-labeled nucleic acid; (d) determining the similarity of the sample to the ER(−) and ER(+) pools by comparing said first fluorescence emission signals and said second fluorescence emission signals, and said first emission signals and said third fluorescence emission signals; and (e) classifying the sample as ER(+) where the first fluorescence emission signals are more similar to said second fluorescence emission signals than to said third fluorescent emission signals, and classifying the sample as ER(−) where the first fluorescence emission signals are more similar to said third fluorescence emission signals than to said second fluorescent emission signals.
- 34. The method of claim 33, wherein said similarity is calculated by determining a first sum of the differences of expression levels for each marker between said first fluorophore-labeled nucleic acid and said first pool of second fluorophore-labeled nucleic acid, and a second sum of the differences of expression levels for each marker between said first fluorophore-labeled nucleic acid and said second pool of second fluorophore-labeled nucleic acid, wherein if said first sum is greater than said second sum, the sample is classified as ER(−), and if said second sum is greater than said first sum, the sample is classified as ER(+).
- 35. The method of claim 33, wherein said similarity is calculated by computing a first classifier parameter P1 between an ER(+) template and the expression of said markers in said sample, and a second classifier parameter P2 between an ER(−) template and the expression of said markers in said sample, wherein said P1 and P2 are calculated according to the formula:
- 36. A method for determining a set of marker genes whose expression is associated with a particular phenotype, comprising the steps of:
(a) selecting phenotype having two or more phenotype categories; (b) identifying a plurality of genes wherein the expression of said genes is correlated or anticorrelated with one of the phenotype categories, and wherein the correlation coefficient for each gene is calculated according to the equation ρ=({right arrow over (c)}&Circlesolid;{right arrow over (r)})/(∥{right arrow over (c)}∥·∥{right arrow over (r)}∥)), wherein {right arrow over (c)} is a number representing said phenotype category and {right arrow over (r)} is the logarithmic expression ratio across all the samples for each individual gene, wherein if the correlation coefficient has an absolute value of 0.3 or greater, said expression of said gene is associated with the phenotype category, wherein said plurality of genes is a set of marker genes whose expression is associated with a particular phenotype.
- 37. The method of claim 36, wherein said set of marker genes is validated by:
(a) using a statistical method to randomize the association between said marker genes and said phenotype category, thereby creating a control correlation coefficient for each marker gene; (b) repeating step (a) one hundred or more times to develop a frequency distribution of said control correlation coefficients for each marker gene; (c) determining the number of marker genes having a control correlation coefficient of 0.3 or above, thereby creating a control marker gene set; and (d) comparing the number of control marker genes so identified to the number of marker genes, wherein if the p value of the difference between the number of marker genes and the number of control genes is less than a threshold, said set of marker genes is validated.
- 38. The method of claim 36, wherein said set of marker genes is optimized by the method comprising:
(a) rank-ordering the genes by amplitude of correlation or by significance of the correlation coefficients to create a rank-ordered list, and (b) selecting an arbitrary number n of marker genes from the top of the rank-ordered list.
- 39. The method of claim 38, wherein said set of marker genes is further optimized by the method comprising:
(a) calculating an error rate for said arbitrary number n of marker genes; (b) increasing by 1 the number of genes selected from the top of the rank-ordered list; (c) calculating an error rate for said number of genes selected from the top of the rank-ordered list; (d) repeating steps (b) and (c) until said number of genes selected from the top of the rank-ordered list includes all genes included in said rank ordered list, and (e) identifying said number of genes selected from the top of the rank-ordered list for which the error rate is smallest, wherein said set of marker genes is optimized when the error rate is the smallest.
- 40. A method for assigning a person to one of a plurality of categories in a clinical trial, comprising determining for each said person the level of expression of at least five of the prognosis markers listed in Table 6, determining therefrom whether the person has an expression pattern that correlates with a good prognosis or a poor prognosis, and assigning said person to one category in a clinical trial if said person is determined to have a good prognosis, and a different category if that person is determined to have a poor prognosis.
- 41. A method of classifying a first cell or organism as having one of at least two different phenotypes, said at least two different phenotypes comprising a first phenotype and a second phenotype, said method comprising:
(a) comparing the level of expression of each of a plurality of genes in a first sample from the first cell or organism to the level of expression of each of said genes, respectively, in a pooled sample from a plurality of cells or organisms, said plurality of cells or organisms comprising different cells or organisms exhibiting said at least two different phenotypes, respectively, to produce a first compared value; (b) comparing said first compared value to a second compared value, wherein said second compared value is the product of a method comprising comparing the level of expression of each of said genes in a sample from a cell or organism characterized as having said first phenotype to the level of expression of each of said genes, respectively, in said pooled sample; (c) comparing said first compared value to a third compared value, wherein said third compared value is the product of a method comprising comparing the level of expression of each of said genes in a sample from a cell or organism characterized as having said second phenotype to the level of expression of each of said genes, respectively, in said pooled sample, (d) optionally carrying out one or more times a step of comparing said first compared value to one or more additional compared values, respectively, each additional compared value being the product of a method comprising comparing the level of expression of each of said genes in a sample from a cell or organism characterized as having a phenotype different from said first and second phenotypes but included among said at least two different phenotypes, to the level of expression of each of said genes, respectively, in said pooled sample; and (e) determining to which of said second, third and, if present, one or more additional compared values, said first compared value is most similar; wherein said first cell or organism is determined to have the phenotype of the cell or organism used to produce said compared value most similar to said first compared value.
- 42. The method of claim 40, wherein said compared values are each ratios of the levels of expression of each of said genes.
- 43. The method of claim 40, wherein each of said levels of expression of each of said genes in said pooled sample are normalized prior to any of said comparing steps.
- 44. The method of claim 42 wherein normalizing said levels of expression is carried out by dividing each of said levels of expression by the median or mean level of expression of each of said genes or dividing by the mean or median level of expression of one or more housekeeping genes in said pooled sample.
- 45. The method of claim 42 wherein said normalized levels of expression are subjected to a log transform and said comparing steps comprise subtracting said log transform from the log of said levels of expression of each of said genes in said sample from said cell or organism.
- 46. The method of claim 40, wherein said at least two different phenotypes are different stages of a disease or disorder.
- 47. The method of claim 40, wherein said at least two different phenotypes are different prognoses of a disease or disorder.
- 48. The method of claim 40, wherein said levels of expression of each of said genes, respectively, in said pooled sample or said levels of expression of each of said genes in a sample from said cell or organism characterized as having said first phenotype, said second phenotype, or said phenotype different from said first and second phenotypes, respectively, are stored on a computer.
- 49. A microarray comprising at least 5 markers derived from any one of Tables 1-6, wherein at least 50% of the probes on the microarray are present in any one of Tables 1-6.
- 50. The microarray of claim 48, wherein at least 70% of the probes on the microarray are present in any one of Tables 1-6.
- 51. The microarray of claim 48, wherein at least 80% of the probes on the microarray are present in any one of Tables 1-6.
- 52. The microarray of claim 48, wherein at least 90% of the probes on the microarray are present in any one of Tables 1-6.
- 53. The microarray of claim 48, wherein at least 95% of the probes on the microarray are present in any one of Tables 1-6.
- 54. The microarray of claim 48, wherein at least 98% of the probes on the microarray are present in any one of Tables 1-6.
- 55. A microarray for distinguishing ER(+) and ER(−) cell samples comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a different gene, said plurality consisting of at least 20 of the genes corresponding to the markers listed in Table 1 or Table 2, wherein at least 50% of the probes on the microarray are present in Table 1 or Table 2.
- 56. A microarray for distinguishing BRCA1-related and sporadic cell samples comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a different gene, said plurality consisting of at least 20 of the genes corresponding to the markers listed in Table 3 or Table 4, wherein at least 50% of the probes on the microarray are present in Table 3 or Table 4.
- 57. A microarray for distinguishing cell samples from individuals having a good prognosis and cell samples from individuals having a poor prognosis, comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridizable to a different, said plurality consisting of at least 20 of the genes corresponding to the markers listed in Table 5 or Table 6, wherein at least 50% of the probes on the microarray are present in Table 5 or Table 6.
- 58. A kit for determining whether a sample contains a BRCA1 or sporadic mutation, comprising at least one microarray comprising probes to at least 20 of the genes corresponding to the markers listed in Table 3, and a computer readable medium having recorded thereon one or more programs for determining the similarity of the level of nucleic acid derived from the markers listed in Table 3 in a sample to that in a BRCA1 pool and a sporadic tumor pool, wherein the one or more programs cause a computer to perform a method comprising computing the aggregate differences in expression of each marker between the sample and BRCA1 and the aggregate differences in expression of each marker between the sample and sporadic pool, or a method comprising determining the correlation of expression of the markers in the sample to the expression in the BRCA1 and sporadic pools, said correlation calculated according to Equation (3).
- 59. A kit for determining the ER-status of a sample, comprising at least one microarray comprising probes to at least 20 of the genes corresponding to the markers listed in Table 1, and a computer readable medium having recorded thereon one or more programs for determining the similarity of the level of nucleic acid derived from the markers listed in Table 1 in a sample to that in an ER(−) pool and an ER(+) pool, wherein the one or more programs cause a computer to perform a method comprising computing the aggregate differences in expression of each marker between the sample and ER(−) pool and the aggregate differences in expression of each marker between the sample and ER(+) pool, or a method comprising determining the correlation of expression of the markers in the sample to the expression in the ER(−) and ER(+) pools, said correlation calculated according to Equation (3).
- 60. A kit for determining whether a sample is derived from a patient having a good prognosis or a poor prognosis, comprising at least one microarray comprising probes to at least 20 of the genes corresponding to the markers listed in Table 5, and a computer readable medium having recorded thereon one or more programs for determining the similarity of the level of nucleic acid derived from the markers listed in Table 5 in a sample to that in a pool of samples derived from individuals having a good prognosis and a pool of samples derived from individuals having a good prognosis, wherein the one or more programs cause a computer to perform a method comprising computing the aggregate differences in expression of each marker between the sample and the good prognosis pool and the aggregate differences in expression of each marker between the sample and the poor prognosis pool, or a method comprising determining the correlation of expression of the markers in the sample to the expression in the good prognosis and poor prognosis pools, said correlation calculated according to Equation (3).
Parent Case Info
[0001] This application claims benefit of U.S. Provisional Application No. 60/298,918, filed Jun. 18, 2001, and U.S. Provisional Application No. 60/380,710, filed on May 14, 2002, each of which is incorporated by reference herein in its entirety.
[0002] This application includes a Sequence Listing submitted on compact disc, recorded on two compact discs, including one duplicate, containing Filename 9301175999.txt, of size 6,766,592 bytes, created Jun. 13, 2002. The sequence listing on the compact discs is incorporated by reference herein in its entirety.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60380710 |
May 2002 |
US |
|
60298918 |
Jun 2001 |
US |