Claims
- 1. A method of data analysis, comprising,
(a) employing data comprising, a plurality of records, each of said records having an associated plurality of attributes, said plurality of records being divisible into at least two categories, (b) assigning as a class of one dimensional vectors one of, said records and said attributes, (c) selecting an integer value K, where K is a maximum number of uncorrelated vectors to be identified from said class of one-dimensional vectors, (d) selecting a threshold value, (e) choosing a first vector from said class of one-dimensional vectors as a member of an uncorrelated set of vectors, and (f) performing iteratively until substantially all vectors in said class of one-dimensional vectors have been analyzed,
(1) selecting an additional vector from said class of vectors, (2) computing a correlation parameter using said first vector and said additional vector, (3) comparing said correlation parameter to said threshold value, and (4) adding said additional vector to said uncorrelated set of vectors if said correlation parameter is not greater than said threshold value.
- 2. The method of claim 1 further comprising, determining, from said uncorrelated set of vectors, a result-effective subset of attributes that is sufficient to divide said records into said at least two categories.
- 3. The method of claim 1, further comprising,
(g) determining whether there are more than K vectors in said set of uncorrelated vectors, (h) if there are more than K vectors in said set,
(1) repeating an integer N number of times steps (d), (e) and (f)(1) through (f)(4), (2) determining N sets of vectors that are uncorrelated, (3) determining whether any of said N subsets have less than or equal to K vectors, and (4) in response to such a determination, employing one of said N subsets having less than or equal to K vectors to determine a result-effective subset of attributes that is sufficient to divide said records into said at least two categories.
- 4. The method of claim 3 wherein N is 10.
- 5. The method of claim 3, further comprising,
(i) upon a determination that none of said N subsets has less than or equal to K members, reducing said threshold value and repeating steps (e) through (f)(4).
- 6. The method of claim 1, wherein said records represent cells and said attributes are properties of said cells.
- 7. The method of claim 1, wherein said records represent mammals and said attributes are characteristics of said mammals.
- 8. The method of claim 1, wherein said records represent a sample from a mammal and said attributes are biological markers.
- 9. The method of claim 8, wherein said biological marker is a gene product.
- 10. The method of claim 8, wherein said biological marker is at least one of a protein and an mRNA.
- 11. The method of claim 1, wherein at least one of said at least two categories represents a predisposition to contract a disease.
- 12. The method of claim 11, wherein said disease is leukemia.
- 13. The method of claim 1, wherein at least one of said at least two categories represents a predisposition to a medical treatment efficacy.
- 14. The method of claim 1, wherein a first category represents a mammal having a first phenotype and a second category represents a mammal having a second, different phenotype.
- 15. The method of claim 14, wherein the first phenotype is a disease affected phenotype.
- 16. The method of claim 14, wherein the second phenotype is a non-disease affected phenotype.
- 17. The method of claim 15, wherein the disease is a cancer.
- 18. A system of data analysis, comprising,
a processor adapted for,
(a) employing data comprising, a plurality of records, each of said records having an associated plurality of attributes, said plurality of records being divisible into at least two categories, (b) assigning as a class of one dimensional vectors one of, said records and said attributes, (c) selecting an integer value K, where K is a maximum number of uncorrelated vectors to be identified from said class of one-dimensional vectors, (d) selecting a threshold value, (e) choosing a first vector from said class of one-dimensional vectors as a member of an uncorrelated set of vectors, (f) performing iteratively until substantially all vectors in said class of one-dimensional vectors have been analyzed,
(1) selecting an additional vector from said class of vectors, (2) computing a correlation parameter using said first vector and said additional vector, (3) comparing said correlation parameter to said threshold value, and (4) adding said additional vector to said uncorrelated set of vectors if said correlation parameter is not greater than said threshold value.
- 19. The system of claim 18, wherein said processor is further adapted for determining, for said uncorrelated set of vectors, a result-effective subset of attributes that is sufficient to divide said records into said at least two categories.
- 20. The system of claim 19, wherein said processor is further adapted for,
(g) determining whether there are more than K vectors in said set of uncorrelated vectors, and (h) in response to there being more than K vectors in said set,
(1) repeating an integer N number of times steps (d), (e) and (f)(1) through (f)(4), (2) determining N sets of vectors that are uncorrelated, (3) determining whether there are K or fewer vectors in any of said N sets, so as to determine an uncorrelated set of vectors having no more than K members, and (4) in response to such a determination, employing one of said N subsets having less than or equal to K vectors to determine a result-effective subset of attributes that is sufficient to divide said records into said at least two categories.
- 21. The system of claim 20 wherein N is 10.
- 22. The system of claim 20, wherein said processor is further adapted for,
(i) reducing said threshold value and repeating steps (e) through (f)(4), upon a determination that none of said N subsets has less than or equal to no set of uncorrelated vectors has no more than K members.
- 23. The system of claim 18, wherein said records represent cells and said attributes are properties of said cells.
- 24. The system of claim 18, wherein said records represent mammals and said attributes are characteristics of said mammals.
- 25. The system of claim 18, wherein said records represent a sample from a mammal and said attributes are biological markers.
- 26. The system of claim 25, wherein said biological marker is a gene product.
- 27. The system of claim 25, wherein said biological marker is at least one of a protein and an mRNA.
- 28. The system of claim 18, wherein at least one of said at least two categories represents a predisposition to contract a disease.
- 29. The system of claim 28, wherein said disease is leukemia.
- 30. The system of claim 18, wherein at least one of said at least two categories represents a predisposition to a medical treatment efficacy.
- 31. The system of claim 18, wherein a first category represents a mammal having a first phenotype and a second category represents a mammal having a second, different phenotype.
- 32. The system of claim 31, wherein the first phenotype is a disease affected phenotype.
- 33. The system of claim 31, wherein the second phenotype is a non-disease affected phenotype.
- 34. The system of claim 32, wherein the disease is a cancer.
- 35. A computer program recorded on a computer-readable medium for graphical data analysis, said computer program when operating performing said steps of,
(a) employing data comprising, a plurality of records, each of said records having an associated plurality of attributes, said plurality of records being divisible into at least two categories, (b) assigning as a class of one dimensional vectors a selected one of said records and said attributes, (c) selecting an integer value K, where K is a maximum number of uncorrelated vectors to be identified from said class of one-dimensional vectors, (d) selecting a threshold value, (e) choosing a first vector from said class of one-dimensional vectors as a member of an uncorrelated set of vectors, and (f) performing iteratively until substantially all vectors in said class of one-dimensional vectors have been analyzed,
(5) selecting an additional vector from said class of vectors, (6) computing a correlation parameter using said first vector and said additional vector, (7) comparing said correlation parameter to said threshold value, and (8) adding said additional vector to said uncorrelated set of vectors if said correlation parameter is not greater than said threshold value.
- 36. The computer program of claim 35, when operating, further comprising, determining, from said uncorrelated set of vectors, a result-effective subset of attributes that is sufficient to divide said records into said at least two categories.
REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. provisional patent application Serial No. 60/285,385, filed Apr. 20, 2001, U.S. provisional patent application Serial No. 60/285,945, filed Apr. 23, 2001, U.S. provisional patent application Serial No. 60/322,771, filed Sep. 17, 2001, and U.S. provisional application identified by Attorney Docket Code ANV-003PR, entitled Multi-Dimensional Interactive Data Visualization Applied To Small Molecule Research, filed Jan. 15, 2002, all of which applications are incorporated herein in their entirety by reference.
[0002] This application is related to U.S. patent application identified by Attorney Docket Code ANV-001, entitled “Method And System For Data Analysis” and to U.S. patent application identified by Attorney Docket Code ANV-002, and entitled “Method And System For Data Analysis”, both of which are filed on even date herewith and incorporated herein in their entirety by reference.
Provisional Applications (4)
|
Number |
Date |
Country |
|
60348854 |
Jan 2002 |
US |
|
60285945 |
Apr 2001 |
US |
|
60285385 |
Apr 2001 |
US |
|
60322771 |
Sep 2001 |
US |