Scalable tools for the analysis of chemical compounds using graph-based querying

Information

  • Research Project
  • 7539247
  • ApplicationId
    7539247
  • Core Project Number
    R44MH086121
  • Full Project Number
    9R44MH086121-02
  • Serial Number
    86121
  • FOA Number
    PAR-07-160
  • Sub Project Id
  • Project Start Date
    9/1/2007 - 17 years ago
  • Project End Date
    8/31/2011 - 13 years ago
  • Program Officer Name
    STIRRATT, MICHAEL J
  • Budget Start Date
    9/10/2008 - 16 years ago
  • Budget End Date
    8/31/2009 - 15 years ago
  • Fiscal Year
    2008
  • Support Year
    2
  • Suffix
  • Award Notice Date
    9/10/2008 - 16 years ago
Organizations

Scalable tools for the analysis of chemical compounds using graph-based querying

[unreadable] DESCRIPTION (provided by applicant): Our current capacity to generate chemical and structural biological data far exceeds our capability to meaningfully assimilate it. The data describes molecules and biological macromolecules and associated properties. A principle common to the structure of all chemical and biological macromolecular entities is the composition of objects related by energetic interaction. A natural representation of all such entities is a graph composed of nodes related by edges. We have developed powerful, scalable techniques that operate on graph databases for efficient similarity searching (Closure-tree), identification of statistically significant subgraphs (GraphRank), and query specification (GraphQL). These techniques are naturally applied to chemical and structural biological data, which are naturally represented as graphs. We have demonstrated the validity of the approach in prior work, and the feasibility in our phase 1 research. The overall goal of this project is to deliver powerful innovative problem solving tools to medicinal chemists, structural biologists, and drug discovery researchers synthesizing ever increasing amounts of chemical, biochemical, structural biological, cell biological, and clinical data. Phase 1 of this project is ongoing and highly successful. We have successfully demonstrated that the Closure- tree and GraphRank algorithms are effective on chemical compound databases of realistic, industrial size. We have developed methods to exploit our knowledge of the nature of chemical databases. Using these methods we have improved similarity query performance time by over an order of magnitude. We have identified several specific aims to purse in Phase 2 of our research. We have rapidly established a professional software development and research infrastructure and developed the tools necessary to support progress toward the goal of solving important problems hindering medicinal chemists and structural biologists conducting modern drug discovery research for the development of new therapeutics. We will pursue four specific aims in our Phase 2 research. (1) We will develop specific additional functionality for Closure-tree and GraphRank, and integrate GraphQL into our chemical and structural bioinformatics tool set. The results of this aim will be used to (2) develop methods and functionality to represent chemical, structural biology, systems biology, and glycobiology data as graphs. Building on these results, we will (3) apply our tool set to specific relevant research problems such as HIV-1 Protease inhibition, Avian Flu neuraminidase inhibition, and p53-protein interactions. Finally, we will (4) assemble a state-of-the-art chemical and structural biological informatics tool set with detailed documentation and relevant case studies. The outcome of this research will be powerful, innovative new tools in the hands of medicinal chemists, structural biologists, and modern drug discovery researchers in academia and the pharmaceutical industry. The tools address significant obstacles in the drug development process and will enable new discoveries and greatly advance the practice of cheminformatic and structural biological data analysis. Through a carefully developed market analysis described in our commercialization plan, we show a growing market for our tools and competitive advantages. Application of our techniques will have significant impact on the interpretation of structural biological data, on pharmaceutical research and modern drug discovery chemistry, and on human health care through the design of new drugs. PUBLIC HEALTH RELEVANCE: Graph-based representation of chemical compounds results in a more accurate realization of the chemical space. The use of recent techniques in graph querying and mining will enable data analysis that can scale to millions of compounds. The developed system will integrate information on chemical compounds with biological activity and protein interaction networks, thus enabling cheaper and faster drug discovery. [unreadable] [unreadable] [unreadable]

IC Name
NATIONAL INSTITUTE OF MENTAL HEALTH
  • Activity
    R44
  • Administering IC
    MH
  • Application Type
    9
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    518950
  • Sub Project Total Cost
  • ARRA Funded
  • CFDA Code
    242
  • Ed Inst. Type
  • Funding ICs
    NIMH:518950\
  • Funding Mechanism
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    ACELOT, INC.
  • Organization Department
  • Organization DUNS
    784692001
  • Organization City
    SANTA BARBARA
  • Organization State
    CA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    931111471
  • Organization District
    UNITED STATES