Claims
- 1. A method for constructing a set of SNP haplotype patterns for data analysis, comprising:
a) providing a data set of SNP haplotype sequences; b) providing a pattern set of SNP haplotype patterns; and c) comparing the SNP haplotype sequences from said data set to the SNP haplotype patterns from said pattern set, wherein if said SNP haplotype sequence being compared is not consistent with any of said SNP haplotype patterns, then said SNP haplotype sequence is added to said pattern set.
- 2. The method of claim 1, wherein initially there are no SNP haplotype patterns in said pattern set.
- 3. The method of claim 1, wherein if said SNP haplotype sequence from said data set being compared is consistent with one SNP haplotype pattern from said pattern set, then further steps are performed comprising:
a) determining a number of ambiguities in said SNP haplotype sequence; b) determining a number of ambiguities in said consistent SNP haplotype pattern; and c) determining which of said SNP haplotype sequence or SNP haplotype pattern has the least number of ambiguities, wherein if said SNP haplotype sequence has the fewest ambiguities it is added to said pattern set and said consistent SNP haplotype pattern is removed from said pattern set, and wherein if said SNP haplotype pattern has the fewest ambiguities it is retained in said pattern set and said SNP haplotype sequence is not added to said pattern set.
- 4. The method of claim 3, further comprising resolving said ambiguities of either or both of said SNP haplotype sequence and said SNP haplotype pattern, by using unambiguous information from either said SNP haplotype pattern or said SNP haplotype sequence, respectively, for each SNP position to create a resolved SNP haplotype sequence; and replacing said SNP haplotype pattern with said resolved SNP haplotype sequence in said pattern set.
- 5. A method for building a pattern set of SNP haplotype patterns comprising:
a) providing a data set of SNP haplotype sequences; b) providing a pattern set of SNP haplotype patterns; c) comparing said SNP haplotype sequences from said data set to said SNP haplotype patterns from said pattern set, wherein if said SNP haplotype sequence being compared is not consistent with any of said SNP haplotype patterns, then said SNP haplotype sequence is added to said pattern set, wherein if said SNP haplotype sequence being compared is consistent with more than one SNP haplotype pattern, then said SNP haplotype sequence is not added to said data set, and wherein if said SNP haplotype sequence being compared is consistent with one SNP haplotype pattern, then d) determining a number of ambiguities in said SNP haplotype sequence that matches one SNP haplotype pattern; e) determining a number of ambiguities in said consistent SNP haplotype pattern; f) resolving said ambiguities of either said SNP haplotype sequence or said SNP haplotype pattern by using information from either said SNP haplotype pattern or said SNP haplotype sequence at each SNP location for which one of either said SNP haplotype pattern or said SNP haplotype sequence contains unambiguous information to create a resolved SNP haplotype sequence; and g) adding said resolved SNP haplotype sequence into said pattern set.
- 6. The method of claim 5, further comprising reanalyzing said SNP haplotype sequences that were consistent with more than one SNP haplotype pattern after an ambiguity in at least one of said more than one SNP haplotype pattern has been resolved, comprising comparing said SNP haplotype sequence to each of said more than one SNP haplotype pattern wherein if said SNP haplotype sequence being compared is not consistent with any of said more than one SNP haplotype patterns, then said SNP haplotype sequence is added to said pattern set.
- 7. A computer readable medium capable of comparing SNP haplotype sequences from a data set to SNP haplotype patterns from a pattern set, wherein if said SNP haplotype sequence being compared is not consistent with any of said SNP haplotype patterns, then said SNP haplotype sequence is added to said pattern set, and wherein if said SNP haplotype sequence being compared is consistent with at least one SNP haplotype pattern, then said SNP haplotype sequence is not added to said pattern set.
- 8. A method for building a data set of SNP haplotype sequences and a pattern set of SNP haplotype patterns comprising:
a) providing a data set comprising a number N of SNP haplotype sequences; b) providing a pattern set comprising SNP haplotype patterns; c) comparing sequentially each SNP haplotype sequence from said data set to said SNP haplotype patterns from said pattern set, wherein if said SNP haplotype sequence being compared does not match any of said SNP haplotype patterns, then said SNP haplotype sequence is added to said pattern set and retained in said data set, wherein if said SNP haplotype sequence being compared matches more than one SNP haplotype pattern, then said SNP haplotype sequence is not added to said pattern set and is retained in said data set, and wherein if said SNP haplotype sequence being compared matches one SNP haplotype pattern, then d) determining a number of ambiguities in said SNP haplotype sequence that matches one SNP haplotype pattern; e) determining a number of ambiguities in said consistent SNP haplotype pattern; f) resolving said ambiguities of either said SNP haplotype sequence or said SNP haplotype pattern by using information from either said SNP haplotype pattern or said SNP haplotype sequence at each SNP location for which one of either said SNP haplotype pattern or said SNP haplotype sequence contains unambiguous information to create a resolved SNP haplotype sequence; h) adding said resolved sequence into said pattern set; and i) performing said steps sequentially on each of said number N of SNP haplotype sequences.
- 9. The method of claim 8, wherein initially there are no SNP haplotype patterns in said pattern set.
- 10. The method of claim 8, wherein said performing sequentially step is performed twice for each of said number N of SNP haplotype sequences.
- 11. The method of claim 10, wherein if, during said comparing step of said second performing step said SNP haplotype sequence being compared is consistent with more than one SNP haplotype pattern, then said SNP haplotype sequence is not added to said pattern set.
- 12. A computer readable medium capable of
a) comparing sequentially each SNP haplotype sequence from said data set to said SNP haplotype patterns from said pattern set, wherein if said SNP haplotype sequence being compared is not consistent with any of said SNP haplotype patterns, then said SNP haplotype sequence is added to said pattern set and retained in said data set, wherein if said SNP haplotype sequence being compared is consistent with more than one SNP haplotype pattern, then said SNP haplotype sequence is not added to said pattern set and is retained in said data set, and wherein if said SNP haplotype sequence being compared is consistent with one SNP haplotype pattern, then b) identifying ambiguities in said consistent SNP haplotype pattern; c) identifying nonambiguous SNP locations in said first SNP haplotype sequence that is consistent with one SNP haplotype pattern; d) resolving said ambiguities of said SNP haplotype pattern in said pattern set by using nonambiguous information from said first SNP haplotype sequence; e) performing said steps sequentially on each of said number N of SNP haplotype sequences.
- 13. A method for building a final SNP haplotype block set of nonoverlapping SNP haplotype blocks that include all SNP positions in a set of overlapping SNP haplotype blocks, comprising:
a) analyze the informativeness of a first haplotype block; b) if the informativeness of said first haplotype block is above a determined threshold level, add said first haplotype block to a candidate SNP block set; c) repeat steps a) and b) for all potential haplotype blocks for a given SNP haplotype sequence to generate a set of candidate SNP blocks; d) compare the informativeness of each of the candidate SNP blocks; e) choose the candidate SNP haplotype block with the highest informativeness for inclusion in a final SNP haplotype block set; f) remove said candidate SNP haplotype block with the highest informativeness from said candidate SNP haplotype block set; g) remove all candidate SNP haplotype blocks that overlap with said candidate SNP haplotype block with the highest informativeness from said candidate SNP haplotype block set; h) repeat steps e), f) and g) until there are no candidate SNP haplotype blocks left in the candidate SNP haplotype block set and a final SNP haplotype block set of nonoverlapping SNP haplotype blocks that include all SNP positions in a set of overlapping SNP haplotype blocks has been generated.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. provisional patent application serial No. 60/414,547, entitled “Methods for Genomic Analysis”, which was originally filed on Apr. 29, 2002 as the U.S. utility application Ser. No. 10/134,510, and was subsequently converted to the cited provisional application. The present application also claims priority to U.S. provisional patent application serial No. 60/280,530, filed Mar. 30, 2001, to U.S. provisional patent application serial No. 60/313,264 filed Aug. 17, 2001, and to U.S. provisional patent application serial No. 60/327,006, filed Oct. 5, 2001, all entitled “Identifying Human SNP Haplotypes, Informative SNPs and Uses Thereof”, provisional patent application serial No. 60/332,550 filed Nov. 26, 2001 and U.S. utility patent application Ser. No. 10/106,097, filed Mar. 26, 2002 both entitled “Methods for Genomic Analysis”, the disclosures all of which are specifically incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60414547 |
Apr 2002 |
US |