Centralized assay datasets for modelling support of small drug discovery organizations

Information

  • Research Project
  • 10321747
  • ApplicationId
    10321747
  • Core Project Number
    R44GM122196
  • Full Project Number
    2R44GM122196-04A1
  • Serial Number
    122196
  • FOA Number
    PA-20-260
  • Sub Project Id
  • Project Start Date
    1/1/2017 - 7 years ago
  • Project End Date
    7/31/2023 - 11 months ago
  • Program Officer Name
    RAVICHANDRAN, VEERASAMY
  • Budget Start Date
    9/1/2021 - 2 years ago
  • Budget End Date
    7/31/2022 - a year ago
  • Fiscal Year
    2021
  • Support Year
    04
  • Suffix
    A1
  • Award Notice Date
    8/24/2021 - 2 years ago

Centralized assay datasets for modelling support of small drug discovery organizations

Project Summary Collaborations Pharmaceuticals, Inc. was formed after identifying a need for software to assist academics and smaller companies in curating their data and discovery of new hits or lead optimisation. In the past two years the continued importance of artificial intelligence (AI) is apparent from the explosive growth in number of these companies and the increasing number of multi-million dollar deals with pharma using Machine Learning (ML) to assist in drug discovery. There is a heavy focus by these companies on the drug discovery modeling aspect but there is a continued unmet need and bottleneck in the curation of quality in vitro and in vivo data ADME/Tox data for ML as well as prospective testing to validate the technologies. In Phase I, we developed a prototype of Assay CentralÒ software and used this with a wide variety of structure activity data from sources both public and private, formatted and unformatted, with ~14 collaborators working on neglected, rare or common disease targets as well as used it for our internal drug discovery projects. In Phase I we also created error checking and correction software. We also built and validated Bayesian models with the datasets that were collected and cleaned. And, in addition, we developed new data visualization tools. The software can be used to create selections of these models for sharing with collaborators as needed and for scoring new molecules and visualizing the multiple outputs in various formats. In Phase II, we have developed Assay CentralÒ into a production tool which is easy to deploy, built on industry standard technologies, provided graphical display of models and information on model applicability. Importantly, we identified that customers wanted us to provide them with the results! We developed our fee-for-service consulting services model using Assay CentralÒ to solve their problems and this has expanded our revenues annually. In Phase II we evaluated additional ML algorithms and molecular descriptors with manually curated datasets as well as compared algorithms across over 5000 auto-curated datasets from ChEMBL. This illustrated the utility of access to multiple algorithms and how the Bayesian algorithm was generally comparable to these other ML algorithms. This also motivated us to develop new software to integrate these algorithms. We have also explored finding rare disease datasets and applying our data curation and ML approach to them. With these and additional collaborations, as well as internal projects on Alzheimer?s disease (through a NIH NIGMS supplement) we have been able to repurpose already approved drugs for several targets for this and other diseases. For multiple projects we have performed several rounds of model building and fed data back into the models to enable improved predictions. Finally, we have developed prototype tools to enable us to develop automated molecule designs, assess their synthesizability and perform retrosynthetic analysis. These combined efforts dramatically increased the number of projects we were able to work on (and ultimately publish to raise our visibility), created new spin off products as collections of models (MegaTransÒ, MegaToxÒ and MegaPredictÒ), molecule related IP, and generated employment. In Phase IIB we now propose a focus on steps to aid commercialization and further development of these technologies. We have identified that developing auto-curation software for dealing with complex biological data in unstructured databases will be a competitive advantage. We have also recognized that for many diseases we can have a complete or near complete collection of targets which may enable us to understand how a molecule may interfere with biological pathways from structure alone and this can be applied to complex diseases and ?adverse outcome pathways? in toxicology. We also propose integrating state of the art multi-objective generative models for molecule design into our Assay Central computational software in order to complement our analog generation and retrosynthesis tools created in Phase II and aid in molecule optimization. We will validate this capability using some of the hit molecules identified in Phase II for different targets including human acetylcholinesterase. Assay Central would then have a full suite of integrated capabilities from data curation through to molecule design and retrosynthetic analysis and will enable us to attract larger deals with companies.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R44
  • Administering IC
    GM
  • Application Type
    2
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    855127
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
  • Funding ICs
    NIGMS:855127\
  • Funding Mechanism
    SBIR-STTR RPGs
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    COLLABORATIONS PHARMACEUTICALS, INC.
  • Organization Department
  • Organization DUNS
    079704473
  • Organization City
    FUQUAY VARINA
  • Organization State
    NC
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    275269278
  • Organization District
    UNITED STATES