Claims
- 1. A method for genetic data analysis, comprising the steps of
collecting genetic data from a plurality of subjects, setting two adjustable parameters for controlling a pattern discovery process; applying a pattern discovery process as a function of the adjustable parameters to find shared features in the genetic data and to identify a set of seed patterns, and applying a statistical test which assigns a significance value to respective ones of the seed patterns representative of whether the respective seed pattern qualifies for further analysis.
- 2. A method according to claim 1, further comprising
determining the statistical significance of a respective seed pattern.
- 3. A method according to claim 2, wherein determining the statistical significance includes comparing against a null hypothesis.
- 4. A method according to claim 2, wherein the null hypothesis comprises
determining a measure representative of the strength of representation of a pattern in a control population.
- 5. A method according to claim 1, wherein
collecting genetic data includes collecting sequence data.
- 6. A method according to claim 1, wherein
collecting genetic data includes collecting data from the group consisting of genotypic data, haplotype data, allelic data, and phenotypic data.
- 7. A method according to claim 1, wherein
collecting genetic data includes collecting data from a case population consisting of individuals having an indication of interest.
- 8. A method according to claim 1, wherein
collecting genetic data includes collecting data from a control population consisting of individuals selected from a general population.
- 9. A method according to claim 1, wherein applying a statistical test includes comparing a distribution bias associated with a seed pattern found from data associated with a case population to a distribution bias associated with a seed pattern found from data associated with a control population.
- 10. A method according to claim 1, wherein applying a statistical test includes comparing a distribution bias associated with a seed pattern found from data associated with a case population to a null hypothesis developed according to a statistical analysis of a control population.
- 11. A method according to claim 1, wherein selecting two adjustable parameters includes selecting a parameter representative of a minimum number of markers in a pattern and a minimum number of samples having the pattern.
- 12. A method according to claim 1, wherein applying a pattern discovery process includes sorting genetic data representative of markers found for members of a population to identify a pattern of one or more markers that is associated with a predetermined minimum number of population members.
- 13. A method according to claim 1, further comprising
merging patterns found from multiple datasets to generate extended patterns.
- 14. A method according to claim 1, further comprising
sorting through identified seed patterns to find maximal patterns representative of patterns constrained by a marker criteria and a support population criteria.
- 15. A method according to claim 1, wherein
discovered patterns are employed in disease association analysis.
- 16. A method according to claim 1, wherein
discovered patterns are employed in linkage analysis.
- 17. A method according to claim 1, wherein
discovered patterns are employed in family-based genetic analysis.
- 18. A method according to claim 1, wherein
discovered patterns are employed in population-based genetic analysis.
- 19. A method according to claim 1, wherein
discovered patterns are employed in sib-pair study analysis or family-trio study analysis.
- 20. A method according to claim 1, wherein applying a statistical test applying a non-statistic test includes calculating the odds ratio.
- 21. A method according to claim 1, wherein
discovered patterns are employed in genome-wide association analysis.
- 22. A method according to claim 1, wherein
discovered patterns are employed in regional association analysis.
- 23. A method according to claim 1, wherein
the data set includes 1 or more markers.
- 24. A method according to claim 1, wherein
applying a statistical test includes a test selected from the group chi-square, Fisher's exact test, transmission disequilibrium test (TDT), haplotype-based haplotype relative risk (HHRR), and T-Test.
- 25. A method according to claim 1, employed to detect locus/gene interactions between two or more loci.
- 26. A method according to claim 1, employed to detect genetic heterogeneity, population substructure or to detect multi-locus association, in which each locus only has small-moderate effect.
- 27. Apparatus for genetic data analysis, comprising
a database having genetic data from a plurality of subjects, a process for setting two adjustable parameters for controlling a pattern discovery process, and applying a pattern discovery process as a function of the adjustable parameters to find shared features in the genetic data and to identify a set of seed patterns, and a statistical test process for assigning a significance value to respective ones of the seed patterns representative of whether the respective seed pattern qualifies for further analysis.
- 28. An apparatus according to claim 27, further comprising
a process for determining the statistical significance of a respective seed pattern.
- 29. A method for identifying genetic associations through maximal pattern discovery, comprising
collecting genetic information from a plurality of patients, converting the genetic information into a predetermined data format; building a seed pattern set with the converted data; and identifying a full pattern from the seed pattern set by applying at least two constraints to the full patterns.
- 30. A method for identifying genetic associations through maximal pattern discovery across multiple data sets, comprising
collecting genetic information from a plurality of patients, converting the genetic information into an appropriate data format; computing the number of individuals with a particular marker value for each marker value; merging patterns from a seed pattern set; merging pairs of patterns in an emerging pattern set, and repeating the second merging step until a combined pattern set is empty or the number of patterns in the combined pattern set is larger than a predetermined number.
- 31. A method according to claim 1 further comprising
providing a data set that contains multiple independent patterns setting a minimum support threshold for full data set size; and performing pattern discovery to identify patterns supported by individuals in a set.
REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application U.S. Ser. No. 60/423,849, having the same title and naming the same inventors, the contents of which are incorporated by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60423849 |
Nov 2002 |
US |