Claims
- 1. A computerized storage and retrieval system of biological information comprising:
a means for data entry; a means for displaying the data; a programmable central processing unit for performing automated analysis; and a data storage means containing protein pathways and annotated information on the pathways stored in a relational database, wherein the pathways annotated and organized in a curated clustering arrangement and wherein the annotated information is accessed through the relational database.
- 2. The computer system of claim 1, wherein the information pertaining to the pathways is stored in a plurality of tables further comprising proteins, their sequences and attributes; protein interactions; protein-protein associations; protein pathways; mRNA, microarray, and protein expression data; genes, their sequences and attributes; and descriptions of cells, tissues, organs, pathology reports, patient histories, and treatments.
- 3. The computer system of claim 1, wherein the central processing unit is programmed to retrieve, input, edit, annotate, search, calculate similarities, align, and predict homologous or orthologous protein pathways.
- 4. The computer system of claim 1, wherein the central processing unit is programmed to perform protein sequence analysis, protein interactions analysis, protein-protein association analysis, protein pathway analysis, gene expression analysis, pathway annotation analysis, pathway edit analysis, pathway expression analysis, tissue expression analysis, subtractive hybridization analysis, electronic northern analysis, or commonality analysis.
- 5. The computer system of claim 1, wherein the data is entered using the standard for pathway representation.
- 6. The computer system of claim 1, wherein a means for displaying the data is used to show two related pathways as a diagram containing nodes which represent proteins or non-protein molecules; modes that represent protein interactions or protein-protein associations; scores calculated from sequence, motif or structural homologies that interrelate nodes; and coefficients of similarity that interrelate modes of the pathway.
- 7. The computer system of claim 1, wherein the central processing unit is programmed to compare two protein pathways by a node-only, a mode-only, or a node-and-mode comparison and wherein the node-only comparison is selected from protein only, non-protein only, and protein and non-protein nodes.
- 8. The computer system of claim 1, wherein the central processing unit is programmed to run an algorithm for dynamic programming comprising:
a) initializing an array, in which a two dimensional array M=Mij with J rows and variant length for each row, the length for i-th row is ni is set up and Mji=0, where 1<=i<=nj, b) backfilling the array via backward recursion with the formula 8Mik=maxj>i1≤l≤nj{w(aik,ajl)+Mjlθ(w(aik,ajl))}for 1≤k≤ni,1≤i≤Jwhere θ(.) is the step function defined as θ(v)={0, if v<=0; 1, if v>0} and w(.,.) is the scoring function between the two nodes, defined as 9w(aik,ajl)={0,if i=j,aik=ajl,aik=-D,or ajl=-Dθ(cik,jl-tc)·{α(1-&LeftBracketingBar;sik-sjl&RightBracketingBar;)+(1-α)cik,jl}otherwisec) using traceback to identify putative pathways PPWj, 1<=j<=max ni with the top n best scores.
- 9. A method for performing pathway editing comprising programming the central processing unit of claim 1 to identify interactions among proteins; weigh the interactions; and calculate coefficients of similarity for the interactions, thereby producing an OS score and editing the protein pathway.
- 10. A method of using genes which encode known proteins to annotate modes of a protein pathway comprising:
a) using the computer system of claim 1 to select genes which encode known proteins, b) employing the genes to produce a protein-protein association matrix containing coefficients of similarity, and c) annotating the modes of the pathway using the coefficients of similarity from the matrix.
- 11. A method for protein pathways analysis using a node-and-mode comparison comprising:
a) submitting a query pathway and protein sequences; and b) allowing the computer system of claim 1 to
i) compare nodes using the dynamic programming algorithm wherein a sequence identity score or p-value summarizes similarity and wherein a weighting factor between 0 and 1 is assigned to corresponding nodes, ii) compare modes by generating a SCIM matrix, thereby assigning a coefficient of similarity to corresponding modes, iii) align pathways globally or locally, wherein insertion or deletion of nodes or modes incurs a penalty, iv) sum all similarity scores, and v) display at least one high-scoring segment of the aligned pathways.
- 12. A method for performing protein pathways analysis comprising:
a) submitting a query pathway and protein sequences; and b) allowing the computer system of claim 1 to
i) organize and analyze the query pathway and protein sequences, ii) compare protein sequence identity of the query with all protein sequences in the protein pathways database using standard methods of protein comparison, iii) use a SCIM matrix to derive and compare coefficients of similarity for each interaction of the query and all interactions for proteins in the protein pathways database, iv) calculate an OS-score based on sequence identity and coefficients of similarity, remove all pathways not meeting user-specified threshold for OS-score, and vi) retrieve aligned pathways meeting the threshold.
- 13. A method for searching a protein pathways database for protein interactions comprising:
a) submitting a query pathway; b) allowing the central processing unit of claim 1 to perform protein interactions analysis between the query pathway and all protein pathways in the protein pathways database wherein coefficient of similarity is produced to interrelate each mode of the query pathway and a mode of the most closely related protein pathway; and c) retrieving at least one protein pathway alignment.
- 14. A method of using a query pathway to search a protein pathways database to predict homologous pathways comprising:
a) submitting a query pathway and protein sequences; b) allowing the central processing unit of claim 1 to compare the query pathway and protein sequences with all protein pathways and proteins in the protein pathways database, and c) retrieving a plurality of pathway alignments wherein the homologous pathways are aligned by OS-score.
- 15. A method of using a known protein pathway and a protein database to predict orthologous pathways comprising:
a) submitting a query pathway and known protein sequences, b) allowing the central processing unit of claim 1 to compare known sequences to all protein sequences stored in the database, c) retrieving orthologous proteins with the highest identity to the known proteins, d) inheriting protein interactions from the query pathway, and e) aligning the query pathway and the orthologous proteins, thereby predicting orthologous pathways.
- 16. A method of using a known protein pathway to predict the nodes and modes of a novel pathway comprising:
a) submitting a query pathway and known protein sequences; b) applying standard methods of comparison to determine similarity between the known protein sequences and protein sequences in the protein databases, thereby predicting candidate nodes; c) utilizing coefficients of similarity from protein interactions or protein-protein association data, thereby predicting candidate modes; and d) retrieving novel pathways with an OP-score obtained using an optimization algorithm.
- 17. The method of claim 16, wherein coefficients of similarity are based on mRNA/cDNA counting, microarray expression, protein expression, known protein-protein associations, a promoter similarity matrix, or more than one of these methods.
- 18. The method of claim 16, further comprising using a constrained clustering method wherein the clustering method is average linkage, single linkage, complete linkage, K-means, or self-organizing maps; the constraint is that no more than one protein in each cluster is derived from a single column of aligned proteins; and the accuracy of the prediction is determined by an OP-score.
- 19. A method for predicting novel pathways comprising:
a) generating candidate proteins from one species for each node based on a protein search; b) employing a means for optimization to find likely linear linkages between candidate proteins aligned to the query pathway with possible gaps in the alignment, and c) reporting all pathways with optimal and sub-optimal predictions that satisfy user-specified alignment and interaction parameters wherein the accuracy of the prediction is provided by OP-score.
- 20. The method of claim 19, wherein the means for optimization is based on linear next-neighbor criteria, global minimization criteria, dynamic programming, or iterative searches using at least two of the means.
- 21. A method for determining the function of a protein or a gene that encodes the protein comprising:
a) placing the protein encoded by the gene in a candidate pathway involving at least two proteins, and b) using the data storage means of claim 1 wherein the interactions with proteins and non-protein molecules, cellular location, and expression are used to determine the function of the protein or gene.
- 22. A method for predicting novel pathways comprising:
a) submitting a query pathway and protein sequence b) using the computer system of claim 1 to process the query pathway and protein sequences using orthologous pathway prediction wherein the data is derived from protein similarities and interactions, or homologous pathway prediction wherein the data is derived from protein similarities and interactions, from protein-protein associations, and c) applying a dynamic programming algorithm or a constrained clustering algorithm, thereby predicting the novel pathways.
Parent Case Info
[0001] This patent application claims the benefit of provisional application 60/347,019 filed Jan. 7, 2002 and provisional application 60/269,711 filed Feb. 20, 2001.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60347019 |
Jan 2002 |
US |
|
60269711 |
Feb 2001 |
US |