Claims
- 1. A method for profiling regulatory factor binding sites;
locating a complete and most 5′ full-length gene for mapping gene regulatory regions; retrieving genomic sequences of gene regulatory regions; screening DNA sequence information for each retrieved gene regulatory region to identify putative regulatory factor binding sites; and profiling the putative regulatory factor binding sites.
- 2. The method of claim 1, wherein mapping includes retrieving full-length genes to provide sequences information for retrieved genes.
- 3. The method of claim 2, wherein mapping includes, mapping the retrieved genes to a recently updated human genome.
- 4. The method of claim 3, wherein the retrieved genes are mapped to the recently updated human genome using a tool provided by at least one of public available UCSC genome browser databases and self-developed scripts.
- 5. The method of claim 3, wherein the transcription start site (TSS) is mapped.
- 6. The method of claim 5, wherein the TSS is mapped by taking the most 5′ TSS of each gene after comparing all available TSS's for the gene.
- 7. The method of claim 1, wherein a genomic sequence of a regulatory region for each retrieved gene with the most 5′ TSS is retrieved from the most updated human genome.
- 8. The method of claim 7, wherein the 5′ regulatory region is the sequences located upstream of the TSS and downstream of the TSS.
- 9. The method of claim 1, wherein a retrieved sequence of a gene regulatory region is the core promoter region.
- 10. The method of claim 9, wherein the core promoter region is includes 200-300 bases upstream and the sequence about 50-100 bases downstream of the TSS.
- 11. The method of claim 5, wherein a genomic sequence of a gene is the upstream enhancer region.
- 12. The method of claim 3, wherein a genomic sequence of a gene regulatory region is a downstream regulatory region.
- 13. The method of claim 7, further comprising:
cutting and storing the corresponding sequences relative to TSS.
- 14. The method of claim 13, wherein the corresponding sequences relative to TSS are cut and stored with the use of self-developed scripts from at least one of the UCSC genome browser or NCBI genome database.
- 15. The method of claim 1, wherein the DNA sequence information is screened using a MATCH program or the similar Position Weighted Matrix Programs for motif searching.
- 16. The method of claim 1, wherein the DNA sequence information screening includes selecting the TF matrix, scores of matrix similarity and scores of core similarity.
- 17. The method of claim 1, wherein cut-off is applied to reduce the false positive and false negative matching during screening.
- 18. The method of claim 1, further comprising:
determining at least one of a genomic or tissue-specific frequency of each binding site.
- 19. The method of claim 1, wherein the frequency is the existence of specific TF binding sites in regulatory regions of all the genes.
- 20. The method of claim 1, wherein the frequency is the existence of specific TF binding sites in regulatory regions of tissue specific genes.
- 21. The method of claim 16, further comprising:
creating a conservation score for each binding site.
- 22. The method of claim 17, wherein the conservation score is selected to cover regions where the TF binding sites are identified.
- 23. The method of claim 17, further comprising:
determining a position of each binding site.
- 24. The method of claim 23, wherein the position is based on a human genome working draft.
- 25. The method of claim 24, wherein the position is a converted position in a human genome working draft.
- 26. The method of claim 23, wherein the genome position of a start and end is determined.
- 27. The method of claim 23, further comprising:
determining a distance of each binding site to the TSS.
- 28. The method of claim 27, wherein the distance is relative to a number of bases between a binding site and the TSS
- 29. The method of claim 27, further comprising:
determining a length of each binding site.
- 30. The method of claim 29, further comprising:
determining sequence information about regions adjacent to the binding site.
- 31. The method of claim 30, further comprising:
determining co-existence information of other binding sites.
- 32. The method of claim 31, further comprising:
determining cluster of the binding sites and their positions.
- 33. The method of claim 1, further comprising:
collecting the binding profiles in a database.
- 34. The method of claim 33, wherein the database includes TF binding profiles for the regulatory region of each gene.
- 35. The method of claim 33, wherein the database is searchable by gene identifiers.
- 36. The method of claim 35, wherein the gene identifiers are selected from the NCBI database.
- 37. The method of claim 36, wherein the NCBI database includes at least one of Unigene Cluster ID, LoucsLink ID and international approved gene symbols.
- 38. The method of claim 35, wherein the database includes genomic frequencies information for TF.
- 39. The database of claim 38, wherein the database is sortable by at least one of TF name and TF frequencies
- 40. The method of claim 39, wherein the TF frequencies include genome frequencies and tissue specific frequencies.
- 41. The method of claim 33, further comprising:
retrieving information from the database for biomedical research.
- 42. The method of claim 33, further comprising:
retrieving information from the database for pre-clinical development.
- 43. The method of claim 33, further comprising:
retrieving information from the database for drug screening applications.
- 44. The method of claim 33, further comprising:
retrieving information from the database for target discovering and target validation.
- 45. The method of claim 33, further comprising:
retrieving information from the database for profiling of a regulatory region.
- 46. The method of claim 33, further comprising:
retrieving information from the database for building the genome or tissue wide connections between regulatory profilings of different genes.
- 47. The method of claim 33, further comprising:
retrieving information from the database for understanding the genome or tissue background of various known transcription profiling understanding the genome or tissue background of various known transcription profiling.
- 48. A method for profiling identified binding sites, comprising:
providing a database that includes profiled identified binding sites for known genes; and applying probability mapping to the profiled binding sites.
- 49. The method of claim 48, wherein the database includes TF binding profiles for the regulatory region of each gene.
- 50. The method of claim 48, wherein the database is searchable by gene identifiers.
- 51. The method of claim 50, wherein the gene identifiers are selected from the NCBI database.
- 52. The method of claim 51, wherein the NCBI database includes at least one of Unigene Cluster ID, LoucsLink ID and international approved gene symbols.
- 53. The method of claim 51, wherein the database includes genomic frequencies information for vertebrate transcription regulatory factors.
- 54. The method of claim 53, wherein the database is sortable by at least one of TF name and TF frequencies
- 55. The method of claim 54, wherein the TF frequencies include genome frequencies and tissue specific frequencies.
- 56. The method of claim 48, further comprising:
retrieving information from the database for biomedical research.
- 57. The method of claim 48, further comprising:
retrieving information from the database for pre-clinical development.
- 58. The method of claim 48, further comprising:
retrieving information from the database for drug screening applications.
- 59. The method of claim 48, further comprising:
retrieving information from the database for target discovering and target validation.
- 60. The method of claim 48, further comprising:
retrieving information from the database for profiling of a regulatory region.
- 61. The method of claim 48, further comprising:
retrieving information from the database for building the genome or tissue wide connections between regulatory profilings of different genes.
- 62. The method of claim 48, further comprising:
retrieving information from the database for understanding the genome or tissue background of various known transcription profiling understanding the genome or tissue background of various known transcription profiling.
- 63. A data structure tangibly stored on a computer readable medium, comprising:
a database that includes profiled identified binding sites, the profiled identified binding sites being created by screening DNA sequence information for gene regulatory regions, and wherein the database is searchable by gene identifiers.
- 64. The data structure of claim 63, wherein the gene identifiers are selected from the NCBI GeneBank identifiers.
- 65. The method of claim 64, wherein the NCBI database includes at least one of Unigene Cluster ID, LoucsLink ID and international approved gene symbols.
- 66. The data structure of claim 63, wherein the database includes TF binding profiles for the regulatory region of each gene.
- 67. The data structure of claim 63, wherein the database includes genomic frequencies information for vertebrate transcription regulatory factors.
- 68. The database of claim 63, wherein the database is sortable by at least one of TF name and TF frequencies
- 69. The data structure of claim 68, wherein the TF frequencies include genome frequencies and tissue specific frequencies.
- 70. The data structure of claim 63, wherein the database includes information for biomedical research.
- 71. The data structure of claim 63, wherein the database includes information for pre-clinical development.
- 72. The data structure of claim 63, wherein the database includes information for drug screening applications.
- 73. The data structure of claim 63, wherein the database includes information for target discovering and target validation.
- 74. The data structure of claim 63, wherein the database includes information for profiling of a regulatory region.
- 75. The data structure of claim 63, wherein the database includes information for building the genome or tissue wide connections between regulatory profilings of different genes.
- 76. The data structure of claim 63, wherein the database includes information for understanding the genome or tissue background of various known transcription profiling understanding the genome or tissue background of various known transcription profiling.
- 77. A computer implemented system for profiling regulatory factor binding sites, comprising:
a database that includes profiled identified binding sites, the profiled identified binding sites being created by screening DNA sequence information for gene regulatory regions, and wherein the database is searchable by gene identifiers; a user interface that includes one or more selectable user inputs; an input device operable by a user; and a display for displaying at least one output in response to the profiled identified binding sites.
- 78. The system of claim 77, wherein the gene identifiers are selected from the NCBI GeneBank identifiers.
- 79. The system of claim 78, wherein the NCBI database includes at least one of Unigene Cluster ID, LoucsLink ID and international approved gene symbols.
- 80. The system of claim 77, wherein the database includes TF binding profiles for the regulatory region of each gene.
- 81. The system of claim 77, wherein the database includes genomic frequencies information for vertebrate transcription regulatory factors.
- 82. The system of claim 77, wherein the database is sortable by at least one of TF name and TF frequencies
- 83. The system of claim 68, wherein the TF frequencies include genome frequencies and tissue specific frequencies.
- 84. The system of claim 77, wherein the database includes information for biomedical research.
- 85. The system of claim 77, wherein the database includes information for pre-clinical development.
- 86. The system of claim 77, wherein the database includes information for drug screening applications.
- 87. The system of claim 77, wherein the database includes information for target discovering and target validation.
- 88. The system of claim 77, wherein the database includes information for profiling of a regulatory region.
- 89. The system of claim 77, wherein the database includes information for building the genome or tissue wide connections between regulatory profilings of different genes.
- 90. The system of claim 77, wherein the database includes information for understanding the genome or tissue background of various known transcription profiling understanding the genome or tissue background of various known transcription profiling.
- 91. The system of claim 77, wherein the at least one output includes at least include one of, a gene name, an identifier, an identified TF binding site, TF names, genomic positions, length, distance, conservation score, binding scores, frequencies information, and binding sites sequences.
- 92. The system of claim 77, further comprising:
a memory; and a microprocessor
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is related to U.S. Ser. No. ______, filed ______, entitled “Statistical Analysis of Regulatory Factor Binding of Differentially Expressed Genes”, and identified as Attorney Docket No. 39753-0002, which application is fully incorporated herein.