MegaPredict for predicting natural product uses and their drug interactions

Information

Research Project
9846761

ApplicationId
9846761
Core Project Number
R43AT010585
Full Project Number
1R43AT010585-01
Serial Number
010585
FOA Number
PA-18-607
Sub Project Id

Project Start Date
8/15/2019 - 6 years ago
Project End Date
8/14/2020 - 5 years ago
Program Officer Name
HOPP, CRAIG
Budget Start Date
8/15/2019 - 6 years ago
Budget End Date
8/14/2020 - 5 years ago
Fiscal Year
2019
Support Year
01
Suffix
Award Notice Date
8/12/2019 - 6 years ago

Organizations

COLLABORATIONS PHARMACEUTICALS, INC.

Information

MegaPredict for predicting natural product uses and their drug interactions

Project Summary The objective of ?MegaPredict? is to enable scientists to generate predictions for a natural product (or any molecule) and identify targets for efficacy assessment as well as identify any potential liabilities. We are building on our previous work which has compiled a comprehensive collection of datasets for structure-activity data for a broad variety of disease targets and other properties, in a form ready for model building. All of these models utilize the many sources of curated open data, including ChEMBL, ToxCast etc. We have developed a prototype of MegaPredict that utilizes Bayesian algorithm and ECFP6 fingerprints to output a list of prioritized ?targets?. We realize that neither the algorithm or the descriptors may be optimal therefore we propose to address this as we validate MegaPredict and develop a product over this proposal. Our team is suitably qualified to develop the software needed and we will leverage our large collaborator network to assist us in validating the activity of compounds. We will initially create a script to take a natural product and score it against many thousands of machine learning models then rank the outputs to propose efficacy targets. We will use over 12,000 ChEMBL derived target-assay / bioactivity groups extracted from the ChEMBL v24 database, as well as EPA Tox21 measurements and other public datasets, using methodology that we have already partially developed. We can repeat this process for over 200 published compounds and access the outputs versus what is known. We intend to compare how the approach performs with synthetic drugs or drug-like compounds as well as natural products. We will assess whether other machine learning algorithms and molecular descriptors can improve predictions. As we generate machine learning models such as Linear Logistic Regression, AdaBoost Decision Tree, Random Forest, Support Vector Machine and deep neural networks (DNN) of varying depth we will assess the predictions for natural products and compare with the Bayesian approach. We will compare ECFP6 with other 2D, 3D descriptors and physicochemical properties in order to identify the optimal combination for generating predictions for natural products and compare how this differs for synthetic compounds. We will validate our predictions for natural product efficacy assessment. We will work closely with multiple academic groups to generate predictions for at least 20 natural products of interest against over 20 different targets or diseases. Our goal will be to identify potential targets that were previously unknown and then generate in vitro data inhouse or with academic collaborators. Develop a prototype user interface for input of a structure, processing an input molecule and output of prioritized targets and liabilities. We have developed multiple software prototypes (e. Assay Central, MegaTox, etc.) previously and will ensure a user-friendly interface and develop new visualization methods and algorithms for prioritizing potential predicted targets based on the outputs of thousands of machine learning models. In Phase I, we will use the software internally with collaborators to rapidly prototype it. In Phase II we will develop a commercial product, and greatly expand our validation by building a larger network of academic and industry partners that would help to prioritize features of most relevance. Using the machine learning models which we have for natural products is limited because ECFP6 fingerprints cannot distinguish between these very different classes of molecules. But this provides us with an opportunity by going for a "pharmacophore" style approach (ideally without using 3D conformations directly). We will therefore focus on developing a ?3D shape-based fingerprint? or developing a novel ?2D fingerprint? that captures the ?3D shape? for natural- and druglike molecules. Currently, the public datasets in ChEMBL and PubChem etc. are made up of mostly druglike molecules, but if we have fingerprints that can compare drug-like and natural product-like molecules then we can likely reliably use our MegaPredict models for natural products as well. We can also attempt to rank natural products with our ChEMBL models or we can look through catalogs of druglike compounds using models derived from natural products. That would be an important innovation. Additionally, in Phase II it would be important to see if we could find uses for natural products with any of the 7000 rare diseases. Developing software that predicts potential natural product drug interactions with various targets could be useful to regulatory organizations as well as the pharmaceutical industry and may broaden utility of being able to more effectively mix natural product and druglike compounds in models will have a profound effect on the value of cheminformatics in this arena.

IC Name

National Center for Complementary & Integrative Health

Activity
R43
Administering IC
AT
Application Type
1

Direct Cost Amount
Indirect Cost Amount
Total Cost
68116
Sub Project Total Cost

ARRA Funded
False
CFDA Code
213
Ed Inst. Type
Funding ICs
NCCIH:68116\
Funding Mechanism
SBIR-STTR RPGs
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
COLLABORATIONS PHARMACEUTICALS, INC.
Organization Department
Organization DUNS
079704473
Organization City
FUQUAY VARINA
Organization State
NC
Organization Country
UNITED STATES
Organization Zip Code
275269278
Organization District
UNITED STATES

MegaPredict for predicting natural product uses and their drug interactions

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

MegaPredict for predicting natural product uses and their drug interactions

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District