Scalable tools for the analysis of chemical compounds using graph-based querying

Information

Research Project
7539247

ApplicationId
7539247
Core Project Number
R44MH086121
Full Project Number
9R44MH086121-02
Serial Number
86121
FOA Number
PAR-07-160
Sub Project Id

Project Start Date
9/1/2007 - 17 years ago
Project End Date
8/31/2011 - 13 years ago
Program Officer Name
STIRRATT, MICHAEL J
Budget Start Date
9/10/2008 - 16 years ago
Budget End Date
8/31/2009 - 15 years ago
Fiscal Year
2008
Support Year
2
Suffix
Award Notice Date
9/10/2008 - 16 years ago

Organizations

Acelot, Inc.

Information

Scalable tools for the analysis of chemical compounds using graph-based querying

[unreadable] DESCRIPTION (provided by applicant): Our current capacity to generate chemical and structural biological data far exceeds our capability to meaningfully assimilate it. The data describes molecules and biological macromolecules and associated properties. A principle common to the structure of all chemical and biological macromolecular entities is the composition of objects related by energetic interaction. A natural representation of all such entities is a graph composed of nodes related by edges. We have developed powerful, scalable techniques that operate on graph databases for efficient similarity searching (Closure-tree), identification of statistically significant subgraphs (GraphRank), and query specification (GraphQL). These techniques are naturally applied to chemical and structural biological data, which are naturally represented as graphs. We have demonstrated the validity of the approach in prior work, and the feasibility in our phase 1 research. The overall goal of this project is to deliver powerful innovative problem solving tools to medicinal chemists, structural biologists, and drug discovery researchers synthesizing ever increasing amounts of chemical, biochemical, structural biological, cell biological, and clinical data. Phase 1 of this project is ongoing and highly successful. We have successfully demonstrated that the Closure- tree and GraphRank algorithms are effective on chemical compound databases of realistic, industrial size. We have developed methods to exploit our knowledge of the nature of chemical databases. Using these methods we have improved similarity query performance time by over an order of magnitude. We have identified several specific aims to purse in Phase 2 of our research. We have rapidly established a professional software development and research infrastructure and developed the tools necessary to support progress toward the goal of solving important problems hindering medicinal chemists and structural biologists conducting modern drug discovery research for the development of new therapeutics. We will pursue four specific aims in our Phase 2 research. (1) We will develop specific additional functionality for Closure-tree and GraphRank, and integrate GraphQL into our chemical and structural bioinformatics tool set. The results of this aim will be used to (2) develop methods and functionality to represent chemical, structural biology, systems biology, and glycobiology data as graphs. Building on these results, we will (3) apply our tool set to specific relevant research problems such as HIV-1 Protease inhibition, Avian Flu neuraminidase inhibition, and p53-protein interactions. Finally, we will (4) assemble a state-of-the-art chemical and structural biological informatics tool set with detailed documentation and relevant case studies. The outcome of this research will be powerful, innovative new tools in the hands of medicinal chemists, structural biologists, and modern drug discovery researchers in academia and the pharmaceutical industry. The tools address significant obstacles in the drug development process and will enable new discoveries and greatly advance the practice of cheminformatic and structural biological data analysis. Through a carefully developed market analysis described in our commercialization plan, we show a growing market for our tools and competitive advantages. Application of our techniques will have significant impact on the interpretation of structural biological data, on pharmaceutical research and modern drug discovery chemistry, and on human health care through the design of new drugs. PUBLIC HEALTH RELEVANCE: Graph-based representation of chemical compounds results in a more accurate realization of the chemical space. The use of recent techniques in graph querying and mining will enable data analysis that can scale to millions of compounds. The developed system will integrate information on chemical compounds with biological activity and protein interaction networks, thus enabling cheaper and faster drug discovery. [unreadable] [unreadable] [unreadable]

IC Name

NATIONAL INSTITUTE OF MENTAL HEALTH

Activity
R44
Administering IC
MH
Application Type
9

Direct Cost Amount
Indirect Cost Amount
Total Cost
518950
Sub Project Total Cost

ARRA Funded
CFDA Code
242
Ed Inst. Type
Funding ICs
NIMH:518950\
Funding Mechanism
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
ACELOT, INC.
Organization Department
Organization DUNS
784692001
Organization City
SANTA BARBARA
Organization State
CA
Organization Country
UNITED STATES
Organization Zip Code
931111471
Organization District
UNITED STATES

Scalable tools for the analysis of chemical compounds using graph-based querying

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Scalable tools for the analysis of chemical compounds using graph-based querying

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District