Claims
- 1. A method for building classifiers comprising:
merging a plurality of datasets representing data associated with a selected biological system; processing the datasets to identify an invariant characteristic of the selected biological system, representative of an identifying characteristic of the biological system; and employing the invariant characteristic to generate a model for classifying datasets or for discovering classes.
- 2. A method according to claim 1, further comprising
normalizing the plurality of data sets.
- 3. A method according to claim 1, further comprising
providing a plurality of datasets each being associated with a respective target phenotype.
- 4. A method according to claim 1, further comprising
scaling the datasets.
- 5. A method according to claim 1, wherein merging includes
extracting a relative feature of the dataset.
- 6. A method according to claim 1, wherein merging includes
replacing a dataset value with a column-wise rank value.
- 7. A method according to claim 1, wherein merging includes
column-wise standardizing dataset values.
- 8. A method according to claim 1, wherein merging includes
replacing a dataset value with a relative feature representative of a comparison between two or more values in a dataset.
- 9. A method according to claim 1, further comprising
applying association discovery to identify patterns.
- 10. A method according to claim 1, further comprising
association discovery to identify itemsets.
- 11. A method according to claim 1, further comprising
creating a database of patterns.
- 12. A method according to claim 1, wherein
employing invariant characteristics includes processing a sample data value to determine a probability of association with a target class.
- 13. A method according to claim 12, wherein determining a probability includes applying a Large Bayes classifier and inference process.
- 14. A method for building models for diagnosing a disease, comprising:
accessing data from a plurality of remote databases, each having datasets representing data associated with a selected biological system; processing the datasets to identify and invariant characteristic of the selected biological system, representative of an identifying characteristic of the biological system; employing the invariant characteristic to generate a model for classifying sample datasets as belonging to a first or second class; and applying sample data to the generated model to determine whether the sample data is associated with at least one of the first and second classes.
- 15. A method according to claim 14, wherein
at least one of the first and second classes is representative of a disease state.
- 16. A system for building classifiers comprising:
a plurality of datasets representing data associated with a selected biological system; a processor for processing the datasets to identify and invariant characteristic of the selected biological system, representative of an identifying characteristic of the biological system; and a model generator capable of employing the invariant characteristic to generate a model for associating a sample dataset with a classification.
- 17. A system according to claim 16, further comprising
a process for applying association discovery to identify patterns within the datasets.
- 18. A system according to claim 16, further comprising
a process for applying association discovery to identify itemsets within the datasets.
- 19. A system according to claim 16, further comprising
a database having storage for a set of identified patterns.
- 20. A system according to claim 16, further comprising
a prediction processor capable of employing invariant characteristics to determine a probability of association between sample data and a target class.
- 22. A computer readable medium having stored thereon instructions for directing a computer to
merge a plurality of datasets representing data associated with a selected biological system; process the datasets to identify and invariant characteristic of the selected biological system, representative of an identifying characteristic of the biological system; and employ the invariant characteristic to generate a model for classifying datasets or for discovering classes.
CLAIM OF PRIORITY
[0001] This application claims priority to U.S. Provisional Application U.S. Ser. No. 60/401,591, filed 6 Aug. 2002, entitled Across Platform and Multiple Dataset Molecular Classification, the contents of which are hereby incorporated by reference in the entirety.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60401591 |
Aug 2002 |
US |