Improving AI/ML-readiness of FaceBase Research Datasets

Information

  • Research Project
  • 10412668
  • ApplicationId
    10412668
  • Core Project Number
    U01DE028729
  • Full Project Number
    3U01DE028729-03S2
  • Serial Number
    028729
  • FOA Number
    PA-20-272
  • Sub Project Id
  • Project Start Date
    8/1/2019 - 4 years ago
  • Project End Date
    7/31/2022 - a year ago
  • Program Officer Name
    KHATIPOV, EMIR A
  • Budget Start Date
    8/1/2021 - 2 years ago
  • Budget End Date
    7/31/2022 - a year ago
  • Fiscal Year
    2021
  • Support Year
    03
  • Suffix
    S2
  • Award Notice Date
    9/10/2021 - 2 years ago

Improving AI/ML-readiness of FaceBase Research Datasets

PROJECT SUMMARY The goal of the FaceBase III Hub was created by the National Institute for Dental and Craniofacial Research (NIDCR) to create a data repository to serve the entire community of dental and craniofacial researchers by sharing diverse data related to craniofacial development and dysmorphia, as well as other research communities that can leverage the diverse data that is in the FaceBase repository. One particularly unique and important element of FaceBase III is that it has over 22,000 facial images from over 11,000 human subjects, many of which are labeled with syndromes based on clinical and genomic diagnoses. Facial images are a critical resource for studying the correlation between genotype and phenotype and have received intense interest within the Artificial Intelligence (AI) and Machine Learning (ML) research field with notable advances in automated phenotyping. While FaceBase embraces the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, there are unique concerns specific to AI/ML research including: presence of noise, uncertainty of labels, and bias within datasets. It is imperative that we remedy any limitations in the utility of FaceBase?s facial imaging data for AI/ML research. In this project, we propose to unlock the tremendous potential of FaceBase facial scans by identifying gaps in how data is characterized, formated, and preprocessed from the perspective of its use in AI/ML research and algorithm development. To accomplish this, we propose to initiate a pilot application that applies existing deep learning algorithms developed by investigators in this proposal to existing FaseBase data (Aim 1). The goal of the pilot is to identify how curation, organization and preparation of FaceBase data might be improved so as to streamline their use in ML/AI based investigations. Based on what we learn from the pilot, we will modify the current FaceBase self curation processes specifically around Facial Scans (Aim 2). This will require us to streamline our process associated with curation of human subject data, so that we have the necessary rich descriptive elements while maintaining required restrictions on data handling. Ultimately, the goal is to position the FaceBase Hub so that the existing facial scan resources become more broadly useful to AI/ML researchers. More significantly, we expect to see an increased availability with facial scan data and other associated data types, such as genotyping and neurofunctional data. By making the proposed improvements to our data ingest procedures, we anticipate that this proposal will allow FaceBase to scale to significantly larger data set sizes, and consequently, cementing and expanding its position as a unique resource to the broader NIH community of ML and AI researchers.

IC Name
NATIONAL INSTITUTE OF DENTAL & CRANIOFACIAL RESEARCH
  • Activity
    U01
  • Administering IC
    DE
  • Application Type
    3
  • Direct Cost Amount
    266807
  • Indirect Cost Amount
    70750
  • Total Cost
    337557
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    121
  • Ed Inst. Type
    BIOMED ENGR/COL ENGR/ENGR STA
  • Funding ICs
    OD:337557\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    ZDE1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    UNIVERSITY OF SOUTHERN CALIFORNIA
  • Organization Department
    BIOSTATISTICS & OTHER MATH SCI
  • Organization DUNS
    072933393
  • Organization City
    Los Angeles
  • Organization State
    CA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    900890701
  • Organization District
    UNITED STATES