Using machine learning to predict odor characteristics from molecular structure

Information

  • Research Project
  • 10142097
  • ApplicationId
    10142097
  • Core Project Number
    F32DC019030
  • Full Project Number
    1F32DC019030-01A1
  • Serial Number
    019030
  • FOA Number
    PA-19-188
  • Sub Project Id
  • Project Start Date
    9/4/2020 - 4 years ago
  • Project End Date
    -
  • Program Officer Name
    RIVERA-RENTAS, ALBERTO L
  • Budget Start Date
    9/4/2020 - 4 years ago
  • Budget End Date
    -
  • Fiscal Year
    2020
  • Support Year
    01
  • Suffix
    A1
  • Award Notice Date
    9/10/2020 - 4 years ago

Using machine learning to predict odor characteristics from molecular structure

PROJECT SUMMARY/ABSTRACT We cannot yet look at a chemical structure and predict if the molecule will have an odor, much less what character it will have. The goal of the proposed research is to apply machine learning to predict perceptual characteristics from chemical features of molecules. The specific aims of the proposal will determine (1) which molecules are odorous , and (2) what data are needed to model odor character. Building a highly predictive model requires two key ingredients: high-quality data and a sound modeling approach. High-quality data must be accurate (ratings are consistent and describe true odor properties) and detailed (ratings describe even small differences in odor properties). We have collected human psychophysical data on a diverse set of molecules and have trained a model to predict if a molecule has an odor, but pilot data identified odorous contaminants that limit model training and measurement of model accuracy. In Aim 1, I will apply my background in analytical chemistry to evaluate the accuracy of the data, using gas chromatography to identify and correct errors caused by chemical contaminants. In Aim 2, I will apply my experience in human sensory evaluation to measure and compare the consistency and the degree of detail in ratings that can be achieved with different sensory methods and subject training procedures. By executing my training plan, I will develop the skills in statistical programming and machine learning needed to employ a sound modeling approach to these problems. The model constructed in Aim 1 will enable prediction of odor classification (odor/odorless) for any molecule and thus define which molecules are perceptually relevant. Predicting odor character is a far more complex challenge ? while a molecule can have only one of two odor classifications (odor or odorless) it may elicit any number of diverse odor character attributes (fruity, floral, musky, sweet, etc.). Descriptive Analysis (DA) is the gold standard method for generating accurate and detailed sensory profiles, but this method is time-consuming. We estimate that an odor character dataset will be large enough (?model-ready?) to predict odor character with approximately 10,000 molecules and that it would require more than 30,000 hours of human subject evaluation, or approximately 6 years for the typical trained panel, to produce this dataset using DA. Before we invest the time and resources, it is responsible to evaluate the relative data quality of more rapid sensory methods. The results of Aim 2 are expected to determine the best approach for generating a model-ready dataset by quantifying trade-offs in degree of detail (data resolution), rating consistency, and method speed of five candidate sensory methods. Together, these aims represent a significant step forward in linking chemical recipe to human odor perception, an advancement that supports the NIDCD goal of understanding normal olfactory function (how stimulus relates to percept) and has many potential applications in foods (what composition of molecules should be present to produce a target aroma percept).

IC Name
NATIONAL INSTITUTE ON DEAFNESS AND OTHER COMMUNICATION DISORDERS
  • Activity
    F32
  • Administering IC
    DC
  • Application Type
    1
  • Direct Cost Amount
    67446
  • Indirect Cost Amount
  • Total Cost
    67446
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    173
  • Ed Inst. Type
  • Funding ICs
    NIDCD:67446\
  • Funding Mechanism
    TRAINING, INDIVIDUAL
  • Study Section
    ZDC1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    MONELL CHEMICAL SENSES CENTER
  • Organization Department
  • Organization DUNS
    088812565
  • Organization City
    PHILADELPHIA
  • Organization State
    PA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    191043308
  • Organization District
    UNITED STATES