Claims
- 1. A method for selecting SNP haplotype patterns, comprising:
isolating a substantially identical nucleic acid strand from a plurality of different origins for analysis; determining more than one SNP location in each nucleic acid strand; identifying SNP locations in said nucleic acid strands that are linked, wherein said linked SNP locations form a SNP haplotype block; identifying isolate SNP haplotype blocks; identifying SNP haplotype patterns that occur in each SNP haplotype block and isolate SNP haplotype block; and selecting each identified SNP haplotype pattern that occurs in at least two of said substantially identical nucleic acid strands from different origins.
- 2. The method of claim 1, wherein said first identifying step is determined by a greedy algorithm or a shortest-paths algorithm.
- 3. The method of claim 1, wherein said SNP haplotype blocks are non-overlapping.
- 4. The method of claim 1, wherein said substantially identical nucleic acid strands are from at least between about 10 to about 100 different origins.
- 5. The method of claim 4, wherein said substantially identical nucleic acid strands are from at least about 16 different origins.
- 6. The method of claim 5, wherein said substantially identical nucleic acid strands are from at least about 25 different origins.
- 7. The method of claim 6, wherein said substantially identical nucleic acid strands are from at least about 50 different origins.
- 8. The method of claim 1, wherein said substantially identical nucleic acid strands are genomic DNA strands.
- 9. The method of claim 1, wherein at least ten percent of genomic DNA from an organism is isolated and analyzed.
- 10. The method of claim 1, wherein at least 1×108 bases from said substantially identical nucleic acid strands are isolated and analyzed.
- 11. The method of claim 1, wherein selected repeat regions from said substantially identical nucleic acid strands are not analyzed.
- 12. The method of claim 1, further comprising:
after said determining step, identifying which SNP locations occur only once in said plurality of identical nucleic acid strands; and excluding said once-occuring SNP locations from analysis.
- 13. The method of claim 1, further comprising:
selecting a SNP haplotype pattern that occurs most frequently in said substantially identical nucleic acid strands; and selecting a SNP haplotype pattern that occurs next most frequently in said substantially identical nucleic acid strands; and repeating said second selecting step until said selected SNP haplotype patterns identify a portion of said substantially identical nucleic acid strands.
- 14. The method of claim 13, wherein said portion is between about 70% and 99% of said substantially identical nucleic acid strands.
- 15. The method of claim 13, wherein said portion is at least about 80% of said substantially identical nucleic acid strands.
- 16. The method of claim 13, wherein no more than about three SNP haplotype patterns are selected.
- 17. A method for selecting a data set of SNP haplotype blocks for data analysis, comprising:
comparing SNP haplotype blocks for informativeness; selecting a first SNP haplotype block with a high informativeness; adding said first SNP haplotype block to said data set; selecting a second SNP haplotype block with a high informativeness; adding said second selected SNP haplotype block to said data set; and repeating said selecting and adding steps until a region of interest of a nucleic acid strand is covered.
- 18. The method of claim 17, wherein said selected SNP haplotype blocks are nonoverlapping.
- 19. The method of claim 17, wherein a greedy algorithm is used to perform said selecting steps.
- 20. A method for determining an informative SNP in a SNP haplotype pattern, comprising:
determining SNP haplotype patterns for a SNP haplotype block; comparing each SNP haplotype pattern of interest in said SNP haplotype block to other SNP haplotype patterns of interest in said SNP haplotype block; selecting at least one SNP in a first SNP haplotype pattern of interest that distinguishes such first SNP haplotype pattern of interest from other SNP haplotype patterns of interest in said SNP haplotype block, wherein said selected at least one SNP is an informative SNP for said first SNP haplotype pattern in said SNP haplotype block.
- 21. The method of claim 20, further comprising repeating said selecting step until a sufficient number of informative SNPs are selected to distinguish a portion of SNP haplotype patterns in a SNP haplotype block.
- 22. The method of claim 21, wherein said selected portion of SNP haplotype patterns is about 70% to about 99% of SNP haplotype patterns in said SNP haplotype block.
- 23. The method of claim 21, wherein said selected protion of SNP haplotype patterns allows identification of a disease of interest.
- 24. A method of determining informativeness of a SNP haplotype block, comprising:
determining a number of SNP locations in said SNP haplotype block; determining a number of informative SNPs required to distinguish SNP haplotype patterns of interest in said SNP haplotype block; and dividing said number of SNP locations by said number of informative SNPs to produce a quotient, wherein said quotient is said informativeness of said SNP haplotype block.
- 25. A method of determining informativeness of a SNP haplotype block, comprising:
determining a number of SNP locations in said SNP haplotype block; determining a number of informative SNPs required to distinguish SNP haplotype patterns of interest in said SNP haplotype block from each other, wherein said number of informative SNPs required to distinguish SNP haplotype patterns of interest is said informativeness of said SNP haplotype block.
- 26. A method for determining disease-related genetic loci without a priori knowledge of a sequence or location of said disease-related genetic loci, comprising:
determining SNP haplotype patterns from at least 16 individuals in a control population; determining SNP haplotype patterns from individuals in a diseased population; and comparing frequencies of said SNP haplotype patterns of said control population with frequencies of said SNP haplotype patterns of said diseased population, wherein differences in said frequencies indicate locations of disease-related genetic loci.
- 27. The method of claim 26, wherein said SNP haplotype patterns are determined in at least 50 individuals in a control population.
- 28. The method of claim 26, wherein said SNP haplotype patterns from said populations are determined using informative SNPs.
- 29. A method of constructing a SNP haplotype block map using multiple whole genomes comprising:
arranging SNPs found in at least about ten percent of said whole genomes into SNP haplotype blocks.
- 30. A method of making associations between SNP haplotype patterns and a phenotypic trait of interest comprising:
building baseline of SNP haplotype patterns by the methods of the present invention; pooling whole genomic DNA from a population having a common phenotypic trait of interest; and identifying said SNP haplotype patterns that are associated with said phenotypic trait of interest.
- 31. The method of claim 30, wherein informative SNPs are used for said building and said identifying steps.
- 32. A method of identifying diagnostic markers comprising;
identifying informative SNPs according to claim 20, wherein said informative SNPs are diagnostic markers based on associations.
- 33. A method for identifying drug discovery targets comprising:
associating SNP haplotype patterns with a disease; identifying a chromosomal location of said associated SNP haplotype patterns; determining a nature of said association of said chromosomal location and said disease; and selecting a chromosomal location or a product of expression of that chromosomal location that is associated with said disease; wherein said selected chromosomal location or a product of expression of that chromosomal location that is associated with said disease is a drug discovery target.
- 34. The method of claim 33, wherein said associated chromosomal locations are prioritized for drug discovery targets based on a set of criteria that includes location in a highly conserved region and location in an intergenic region.
- 35. The method of claim 33, wherein informative SNPs are used in said associating step.
- 36. A method of determining a SNP haplotype pattern of an individual comprising:
assaying for at least one informative SNP.
- 37. A method for defining SNP haplotype patterns of a species or subset of species comprising:
identifying SNPs present in genomes of multiple organisms of said species; arranging said SNPs into SNP haplotype blocks by iteratively selecting for SNP haplotype patterns having few ambiguous positions.
- 38. A database comprising SNP haplotype blocks derived from genomes of multiple organisms, wherein said database identifies at least one informative SNP and wherein said database is on computer-readable medium.
- 39. The database of claim 38, further comprising information on one or more factors selected from a group consisting of environmental factors, other genetic factors, related factors, including but not limited to biochemical markers, behaviors, and/or other polymorphisms, including but not limited to low frequency SNPs, repeats, insertions and deletions.
- 40. A database on a computer-readable medium comprising SNP haplotype patterns identified as associated with one or more specific phenotypic traits.
- 41. The database of claim 40, further comprising information on one or more factors selected from a group consisting of environmental factors, other genetic factors, related factors, including but not limited to biochemical markers, behaviors, and/or other polymorphisms, including but not limited to low frequency SNPs, repeats, insertions and deletions.
- 42. A database on a computer-readable medium comprising informative SNPs identified as associated with one or more specific phenotypic traits.
- 43. The database of claim 42, further comprising information on one or more factors selected from a group consisting of environmental factors, other genetic factors, related factors, including but not limited to biochemical markers, behaviors, and/or other polymorphisms, including but not limited to low frequency SNPs, repeats, insertions and deletions.
- 44. A kit for diagnosis of a disease, disease susceptibility, or therapy response comprising means for detecting a presence or absence of SNP haplotype patterns or informative SNPs in a sample of genomic DNA from a patient and a data set of associations of said SNP haplotype patterns or informative SNPs with one or more specific phenotypic traits on a computer-readable medium.
- 45. An isolated nucleic acid comprising at least one informative SNP, wherein said informative SNP indicates a SNP haplotype pattern as determined in accordance with the methods of the invention, wherein said informative SNP is associated with a phenotypic trait.
- 46. A method comprising:
identifying genetic variations in a plurality of individuals; identifying at least some of said genetic variations in individuals that occur with at least some other of said genetic variations; and using some, but not all, of said variations that occur with at least some others of said genetic variations in correlation with a phenotypic state.
- 47. A method comprising:
determining a sequence of an organism; scanning additional individuals of said organism for variants from said sequence; identifying some of said variants that occur with others of said variants in a first group; identifying some of said variants that occur with others of said variants in a second group; and using some, but not all, of said variants in said first and second groups to correlate said groups with a phenotypic state.
- 48. A method for selecting a SNP haplotype block useful in genomic analysis, comprising:
isolating a substantially identical DNA strand from at least about five different origins for analysis; analyzing at least about 1×106 bases from each of said substantially identical DNA strand from at least about five different origins; determining more than one SNP location in each DNA strand; identifying SNP locations in said DNA strands that are linked, wherein said linked SNP locations form a SNP haplotype block; identifying SNP haplotype patterns that occur in each SNP haplotype block; and selecting each identified SNP haplotype pattern that occurs in any of said substantially identical DNA strands from different origins.
- 49. A method for determining pharmacogenomic-related genetic loci without a priori knowledge of a sequence or location of said pharmacogenomic-related genetic loci, comprising:
determining SNP haplotype patterns from at least 16 individuals in a control population; determining SNP haplotype patterns from individuals that react in an altered manner to administration of a substance; and comparing frequencies of said SNP haplotype patterns of said control population with frequencies of said SNP haplotype patterns of said individuals that react in an altered manner to administration of a substance, wherein differences in said frequencies indicate locations of pharmacogenomic-related genetic loci.
- 50. The method of claim 49, wherein said SNP haplotype patterns are determined in at least 50 individuals in a control population.
- 51. The method of claim 49, wherein said SNP haplotype patterns from said populations are determined using informative SNPs.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. provisional patent application serial No. 60/280,530, filed Mar. 30, 2001, to U.S. provisional patent application serial No. 60/313,264 filed Aug. 17, 2001, to U.S. provisional patent application serial No. 60/327,006, filed Oct. 5, 2001, all entitled “Identifying Human SNP Haplotypes, Informative SNPs and Uses Thereof”, and provisional patent application serial number [unassigned] filed Nov. 26, 2001, attorney docket number 1005P-4, entitled “Methods for Genomic Analysis”, the disclosures all of which are specifically incorporated herein by reference.
Provisional Applications (4)
|
Number |
Date |
Country |
|
60280530 |
Mar 2001 |
US |
|
60313264 |
Aug 2001 |
US |
|
60327006 |
Oct 2001 |
US |
|
60332550 |
Nov 2001 |
US |