Centralized assay datasets for modelling support of small drug discovery organizations

Information

Research Project
9254390

ApplicationId
9254390
Core Project Number
R43GM122196
Full Project Number
1R43GM122196-01
Serial Number
122196
FOA Number
PA-15-269
Sub Project Id

Project Start Date
1/1/2017 - 8 years ago
Project End Date
6/30/2017 - 8 years ago
Program Officer Name
RAVICHANDRAN, VEERASAMY
Budget Start Date
1/1/2017 - 8 years ago
Budget End Date
6/30/2017 - 8 years ago
Fiscal Year
2017
Support Year
01
Suffix
Award Notice Date
12/15/2016 - 8 years ago

Organizations

COLLABORATIONS PHARMACEUTICALS, INC.

Information

Centralized assay datasets for modelling support of small drug discovery organizations

Summary The objective of ?Assay Central? is to compile a comprehensive collection of datasets for structure-activity data for a broad variety of disease targets and absorption, distribution, metabolism, excretion and toxicology (ADMET) properties, in a form that is immediately ready for model building and other forms of analysis using cheminformatics methods. This is aided by the existence of many sources of curated open data, and one in particular, ChEMBL 1, 2 will be used as the nucleus in Phase I. This bioassay data collection is incredibly valuable, but not currently provided in a form that is ready-to-go for use by small research and development (R&D) organisations that do not have their own in-house cheminformatics teams. The effort required to preprocess, filter, merge, validate and normalize the structure and activity data requires a great deal of software expertise and medicinal chemistry domain knowledge, which are key skillsets that are rare and expensive to combine within the same team. Create a script to analyze the databases like ChEMBL, selected parts of PubChem and others 1, 2 and partition it into groups of compatible activity measurements against the same target. We will seed the dataset collection with a set of 1840 target-assay groups that have been recently extracted from the ChEMBL v20 database, as well as EPA Tox21 measurements 3, using methodology that we have already developed (similar to that described in 4). We will build error checking and correction software. We will apply best-of-breed methodology for checking and correcting structure-activity data 5 which errs on the side of caution for problems with non- obvious solutions, so that we can manually identify problems and either apply patches, or datasource-specific automated corrections. We will build and validate Bayesian models with the datasets collected and cleaned. For each of the target-activity groups, we will create a Bayesian model using ECFP6 or FCFP6 fingerprints, and this will be one of the primary outputs from the project. Models will be evaluated using internal and external testing with receiver operator characteristic (ROC > 0.75), the integral of the true-negative-rate ? true-positive-rate curve as well as the enrichment,6 Kappa value and positive predicted value.7 We will develop new data visualization tools as a proof of concept in phase I. We have already begun to explore preliminary visualization methods using multiple models, but these have so far focused primarily on a handful of machine learning models selected from a very large list. New visualization techniques are required to summarize large matrices of data, e.g. a list of proposed structures vs. thousands of target models. In Phase II we will expand by upgrading to newer ChEMBL releases, selectively incorporating screening runs from other databases (such as PubChem 8), These tools will consist of software created explicitly for this project (particularly web-based interfaces), as well as enhanced functionality added to 3rd party tools that we influence (e.g. mobile apps) and open source projects that we have already contributed to (e.g. CDK for fingerprints and Bayesian modelling). We will widely publicise Assay Central at conferences and in papers. Being able to use transparent computational models simultaneously for visualizing activity trends for multiple targets (both diseases and ADMET) removes the burden of curation or purchasing and maintaining expensive software, and drastically simplifies the addition of new data. It also represents a new frontier of drug discovery as a world of small, agile distributed R&D organizations has access to valuable public datasets that can inform their research. Such computational models will assist in drug repurposing efforts internally and with our collaborators while likely identifying new compounds for a wide array of drug discovery projects.

IC Name

NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

Activity
R43
Administering IC
GM
Application Type
1

Direct Cost Amount
Indirect Cost Amount
Total Cost
149999
Sub Project Total Cost

ARRA Funded
False
CFDA Code
859
Ed Inst. Type
Funding ICs
NIGMS:149999\
Funding Mechanism
SBIR-STTR RPGs
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
COLLABORATIONS PHARMACEUTICALS, INC.
Organization Department
Organization DUNS
079704473
Organization City
FUQUAY VARINA
Organization State
NC
Organization Country
UNITED STATES
Organization Zip Code
275269278
Organization District
UNITED STATES

Centralized assay datasets for modelling support of small drug discovery organizations

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Centralized assay datasets for modelling support of small drug discovery organizations

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District