Causal analysis in complex biological systems

Information

  • Patent Application
  • 20070225956
  • Publication Number
    20070225956
  • Date Filed
    March 27, 2006
    18 years ago
  • Date Published
    September 27, 2007
    17 years ago
Abstract
Disclosed are software assisted systems and methods for analyzing biological data sets to generate hypotheses potentially explanatory of the data. Active causative relationships in the biology of complex living systems are discovered by providing a data base of biological assertions comprising a multiplicity of nodes representative of a network of biological entities, actions, functional activities, and concepts, and relationship links between the nodes. Simulating perturbation of individual root nodes in the network initiates a cascade of virtual activity through the relationship links to discern plural branching paths within the data base. Operational data, e.g., experimental data, representative of a real or hypothetical perturbations of one or more nodes are mapped onto the data base. The branching paths then are prioritized as hypotheses on the basis of how well they predict the operational data. Logic based criteria are applied to the graphs to reject graphs as not likely representative of real biology. The result is a set of remaining graphs comprising branching paths potentially explanatory of the molecular biology implied by the data.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a flow chart illustrating the structure of a data base in accordance with one embodiment of the invention;



FIG. 2 is a block diagram illustrating a sequence of steps in accordance with one embodiment of the invention;



FIG. 3 is a graphical representation of a biochemical network embodied within a data base comprising an assembly directed toward a selected biological system (here generalized human biology) in accordance with one embodiment of the invention;



FIG. 4 is a graphical representation of a “hypothesis” (branching path or graph) useful in explaining the nature of the hypotheses that are pruned in accordance with the invention to deduce a causal relationship explanatory of real biology in accordance with one embodiment;



FIG. 5 is a key indication the meaning of the various symbols used in the schematic graphical representation of a branching path illustrated in FIGS. 6 through 14;



FIGS. 6-14 are illustrations of graphs useful in explaining the various computationally based methods of pruning candidate hypotheses in accordance various embodiments of the invention;



FIG. 15 is a block diagram of an apparatus for performing the methods described herein.


Claims
  • 1. A software assisted method of discovering active causative relationships in the biology of complex living systems, the method comprising the steps of: providing a data base of biological assertions concerning a selected biological system, the data base comprising a multiplicity of nodes representative of a network of biological entities, actions, functional activities, and concepts, and relationship links between nodes indicative of there being a relationship therebetween, at least some of which include indicia of causal directionality;simulating in the network one or more perturbations of plural individual root nodes to initiate a cascade of virtual activity through said relationship links along connected nodes to discern plural branching paths within the data base;mapping onto the data base operational data representative of a perturbation of one or more nodes and optionally of experimentally observed or hypothesized changes in other nodes resulting from the one or more perturbations; andprioritizing said branching paths on the basis of how well they predict said operational data, thereby to define a set of graphs comprising said branching paths potentially explanatory of the molecular biology implied by the data; andapplying logic based criteria to said set of graphs to reject graphs as not likely representative of real biology thereby to eliminate hypotheses and to identify from remaining graphs one or more active causative relationships.
  • 2. The method of claim 1 wherein said simulation is conducted downstream along said relationship links from cause to effect.
  • 3. The method of claim 1 wherein a said logic based criterion is based on a measure of consistency between the predictions resulting from simulation along multiple nodes of a graph and known biology of said selected biological system.
  • 4. The method of claim 1 wherein a said logic based criterion is based on a measure of consistency between the operational data and the predictions resulting from simulation within a graph upstream from a root node to a node corresponding to an operational data point.
  • 5. The method of claim 1 wherein a said logic based criterion is based on a measure of consistency between the operational data and the predictions resulting from simulation within a graph downstream from a root node to a node corresponding to an operational data point.
  • 6. The method of claim 1 wherein a said logic based criterion comprises a group of branching paths generated by mapping against random or control data used as a filter to eliminate a graph from said set of graphs.
  • 7. The method of claim 1 wherein a said logic based criterion is based on an assessment of non causal links or descriptor nodes associated with a said graph for consistency with known aspects of the biology of said selected biological system.
  • 8. The method of claim 7 wherein said assessment is for mutual anatomic accessibility in vivo in said selected biological system of the nodes representing entities in a said graph.
  • 9. The method of claim 7 wherein said assessment is for non causal descriptors of function of the nodes representing entities in a said graph.
  • 10. The method of claim 1 wherein a said logic based criterion is based on multiple causal connections to a concept node.
  • 11. The method of claim 1 wherein a said logic based criterion is based on a measure of consistency between the predictions resulting from simulation along said branching path and the operational data.
  • 12. The method of claim 11 wherein the measure of consistency is a determination of whether the perturbation of the root node corresponds to said operational data.
  • 13. The method of claim 12 wherein the measure of consistency is based on the number of nodes perturbed in a path of a said graph which correspond to said operational data.
  • 14. The method of claim 12 wherein the measure of consistency is a determination of a plurality of graphs which together best correlate with the operational data.
  • 15. The method of claim 14 wherein the plurality of graphs which together best correlate with the operational data is determined by applying an algorithm for exploring combinatorial space to multiple graphs with the number of correct node simulations as a fitness function.
  • 16. The method of claim 1 wherein a said logic based criterion is based on prioritization of retention of graphs comprising paths wherein plural nodes are perturbed in the same direction as said operational data.
  • 17. The method of claim 1 comprising the additional step of harmonizing a plurality of said remaining graphs to produce a larger graph comprising a model of a portion of the operation of a said biological system.
  • 18. The method of claim 17 further comprising the step of simulating operation of said model to make predictions about said selected biological system.
  • 19. The method of claim 18 comprising simulating operation of said model to select biomarkers of said selected biological system.
  • 20. The method of claim 18 comprising simulating operation of said model to select biological entities for drug modulation of said selected biological system.
  • 21. The method of claim 18 comprising simulating operation of said model to stratify patients for a clinical trial.
  • 22. The method of claim 18 comprising simulating operation of said model to develop a diagnostic assay for a disease.
  • 23. The method of claim 18 comprising simulating operation of said model to select an animal model for drug testing.
  • 24. The method of claim 1 comprising applying a plurality of logic based criteria to said set of graphs.
  • 25. The method of claim 1 comprising producing a scoring system indicative of how close a said graph approaches explanation of the operational data.
  • 26. The method of claim 1 comprising applying a plurality of logic based criteria to said set of graphs, without regard to the operational data, to prioritize said graphs so as to discern one or more which model known aspects of the biology of said selected biological system.
  • 27. The method of claim 1 comprising providing said data base by: providing a data base of biological assertions comprising a multiplicity of nodes representative of biological elements and descriptors characterizing the elements or relationships among nodes;extracting a subset of assertions from the data base that satisfy a set of biological criteria specified by a user to define a said selected biological system; andcompiling the extracted assertions to produce an assembly comprising a biological knowledge base of assertions potentially relevant to said selected biological system.
  • 28. The method of claim 27 comprising the additional step of transforming said assembly to generate new biological knowledge about said selected biological system.
  • 29. The method of claim 28 wherein transforming is done by applying reasoning to said extracted assertions to remove logical inconsistencies or to augment the assertions therein by adding to said assembly additional assertions from said data base.
  • 30. The method of claim 1 wherein the operational data comprises an effective increase or decrease in concentration or number of a biological element, stimulation or inhibition of activity of an element, alterations in the structure of an element, or the appearance or disappearance of an element.
  • 31. The method of claim 1 wherein the operational data is experimentally determined data.
  • 32. A software assisted method for discovering active causative relationship mechanisms in the biology of a selected biological system, the method comprising the steps of: providing a data base comprising a multiplicity of nodes representative of a network of biological entities, biological actions, functional biological activities, and biological concepts, and links between nodes indicative of there being a relationship therebetween;applying an algorithm to the database to identify plural graphs among linked nodes in the network potentially relevant to the functional operation of at least a portion of a selected biological system;mapping onto the data base operational data representative of perturbations of one or more nodes thereby to select a set of plural graphs for further investigation; andapplying to said set of graphs filtering criteria based on assessments of how well a graph predicts said operational data to remove graphs from consideration as a viable hypotheses thereby to identify one or more remaining graphs comprising a theoretical basis of a hypothesis potentially explanatory of the biological mechanism implied by the data.
  • 33. The method of claim 32 wherein the mapping step is conducted before applying an algorithm to the database.
  • 34. The method of claim 32 wherein at least a portion of said links further comprise indicia of causal directionality between nodes.
  • 35. The method of claim 34 wherein the step of applying an algorithm to the data base comprises simulating a cascade of biological activity through the network from perturbation of plural individual root nodes through said links along connected nodes to discern plural graphs including nodes corresponding to an operational data point.
  • 36. The method of claim 32 comprising the additional step of selecting for further examination individual said discerned graphs comprising a node linked directly to plural other nodes, wherein more than one of said plural other nodes is a node corresponding to a data point in said operational data.
  • 37. The method of claim 36 wherein said more than one of said plural other nodes corresponding to a data point in said operational data comprises a fraction of said plural other nodes greater than the data base average fraction of plural other nodes linked directly to a node which correspond to a data point in said operational data.
  • 38. The method of claim 32 comprising the additional step of selecting for further examination individual said discerned graphs comprising a node linked directly to plural other nodes, wherein more than one of said plural other nodes corresponds in direction of change to an operational data point.
  • 39. The method of claim 38 wherein said more than one of said plural other nodes corresponding in direction of change to an operational data point comprises a fraction of said plural other nodes greater than the average fraction of plural other nodes linked directly to a node which correspond in direction of change to an operational data point found in the data base.
  • 40. A software assisted method for discovering active causative relationship mechanisms in the biology of a selected biological system, the method comprising the steps of: providing a data base comprising a multiplicity of nodes representative of a network of biological entities, biological actions, functional biological activities, and biological concepts, and links between nodes indicative of there being a relationship therebetween;mapping onto the data base operational data representative of perturbations of plural nodes;simulating a cascade of biological activity through the network from perturbation of plural individual root nodes through said links along connected nodes to discern plural graphs to plural nodes within the data base representative of plural data point of the operational data;selecting for further examination individual said discerned graphs comprising a node linked directly to plural other nodes, wherein more than one of said plural other nodes is a node represented by a data point in said operational data; andapplying to individual said discerned graphs additional filtering criteria based on assessments of how well a graph predicts said operational data to remove graphs from consideration as a viable hypotheses thereby to identify one or more remaining graphs comprising a theoretical basis of a new hypothesis potentially explanatory of the biological mechanism implied by the data.
  • 41. The method of claim 40 comprising the additional step of selecting for further examination individual said discerned graphs comprising a node linked directly to plural other nodes, wherein more than one of said plural other nodes corresponds to an operational data point.
  • 42. A method permitting discovery by an investigator of causative relationship mechanisms in the biology of a selected biological system, the method comprising the steps of causing a second party entity or entities to: provide a data base comprising a multiplicity of nodes representative of a network of biological entities, biological actions, functional biological activities, and biological concepts, and links between nodes indicative of there being a relationship therebetween;apply an algorithm to the database to identify plural graphs among linked nodes in the network potentially relevant to the functional operation of at least a portion of a selected biological system;map onto the data base operational data representative of perturbations of one or more nodes thereby to select a set of plural graphs for further investigation;apply to said set of graphs filtering criteria based on assessments of how well a graph predicts said operational data to remove graphs from consideration as a viable hypotheses; anddeliver a report to the investigator based on one or more remaining graphs comprising a theoretical basis of a hypothesis potentially explanatory of the biological mechanism implied by the data.
  • 43. The method of claim 42 wherein said investigator supplies said operational data to a said second party entity.
  • 44. The method of claim 42 wherein at least a portion of said links further comprise indicia of causal directionality between nodes.
  • 45. The method of claim 42 wherein the step of causing a second party entity or entities to apply an algorithm to the data base comprises causing said entity to simulate a cascade of biological activity through the network from perturbation of plural individual root nodes through said links along connected nodes to discern plural graphs including nodes corresponding to an operational data point.
  • 46. The method of claim 42 wherein said investigator is a pharmaceutical company and a said second entity is a discovery unit associated with the pharmaceutical company or an outside contractor.
  • 47. The method of claim 42 wherein the investigator is situated in the country where this patent is in force and a second party entity is outside said country.
  • 48. An apparatus for discovering causative relationship mechanisms in the biology of a selected biological system, the apparatus comprising: means for applying to a data base comprising a multiplicity of nodes representative of a network of biological entities, biological actions, functional biological activities, and biological concepts, and links between nodes indicative of there being a relationship therebetween, an algorithm to identify plural graphs among linked nodes in the network potentially relevant to the functional operation of at least a portion of a selected biological system;means for receiving operational data representative of perturbations of one or more nodes;means for mapping onto the data base said operational data for selecting a set of plural graphs for further investigation; andmeans for applying to said set of graphs filtering criteria based on assessments of how well a graph predicts said operational data to remove graphs from consideration as a viable hypotheses, thereby to permit identification of one or more remaining graphs comprising a theoretical basis of a hypothesis potentially explanatory of the biological mechanism implied by the data.