Claims
- 1. A computer implemented method for gene cluster analysis comprising:
Performing a cluster analysis to obtain clusters for a plurality of annotation terms; and Assigning interested genes into the clusters according to their annotation terms.
- 2. The method of claim 1 wherein the gene annotation terms are GO terms.
- 3. The method of claim 2 wherein the cluster analysis is based upon pair wise similarity measures between the GO terms.
- 4. The method of claim 3 wherein at least one interested gene is assigned to a plurality of clusters.
- 5. The method of claim 3 wherein the cluster analysis is performed with a clique finding algorithm.
- 6. The method of claim 3 wherein the pair wise similarity measures are determined according to the GO digraph paths.
- 7. The method of claim 6 wherein each of the pair wise similarity measures is calculated based upon the length of partial path shared by two annotation terms.
- 8. The method of claim 7 wherein a weighing factor is assigned to each edge as a function of the level in a path.
- 9. The method of claim 8 wherein the stringency of similarity scaling may be adjusted by adjusting the weighting factor.
- 10. The method of claim 7 wherein a greedy method is used to select the longest common partial path when an annotation term is in multiple paths.
- 11. A computer implemented method for identifying a potential drug target gene comprising:
Inputting a gene that is related to a disease; Finding an associated gene cluster, wherein the associated gene cluster is a gene cluster that include the gene that is related to the disease; and Identifying a gene in the associated gene cluster as the potential drug target.
- 12. The method of claim 11 wherein the gene cluster is obtained using the method of claim 1.
- 13. The method of claim 12 wherein the gene annotation similarity matrix contains pair wise similarity measures between the GO terms.
- 14. The method of claim 13 wherein at least one interested gene is assigned to a plurality of clusters.
- 15. The method of claim 14 wherein the cluster analysis is performed with a clique finding algorithm.
- 16. The method of claim 15 wherein the pair wise similarity measures are determined according to the GO digraph paths.
- 17. The method of claim 16 wherein each of the pair wise similarity measures is calculated based upon the length of partial path shared by two annotation terms.
- 18. The method of claim 17 wherein a weighing factor is assigned to each edge as a function of the level in a path.
- 19. The method of claim 18 wherein the stringency of similarity scaling may be adjusted by adjusting the weighting factor.
- 20. The method of claim 19 wherein a greedy methods is used to select the longest common partial path when an annotation term is in multiple paths.
- 21. The method of claim 20 wherein the Euclidean distances are converted to similarity scores by subtraction from 5.
- 22. The method of claim 21 wherein the combing comprises summing the Euclidean distances with GO similarity matrix at a ratio to generate the gene similarity matrix.
RELATED APPLICATIONS
[0001] This application claims the priority to U.S. Provisional Application Serial No. 60/297,210.
[0002] This application is related to U.S. patent application Ser. No. 10/026,110, 10/256,938 and __/____ , attorney docket 3359, titled “Statistical Analysis for Gene Ontology”, filed on Dec. 3, 2002 and U.S. patent application Docket Ser. No. 3545, filed concurrently herewith. The cited applications are incorporated herein by reference.