Improving AI/ML-readiness of FaceBase Research Datasets

Information

Research Project
10412668

ApplicationId
10412668
Core Project Number
U01DE028729
Full Project Number
3U01DE028729-03S2
Serial Number
028729
FOA Number
PA-20-272
Sub Project Id

Project Start Date
8/1/2019 - 4 years ago
Project End Date
7/31/2022 - a year ago
Program Officer Name
KHATIPOV, EMIR A
Budget Start Date
8/1/2021 - 2 years ago
Budget End Date
7/31/2022 - a year ago
Fiscal Year
2021
Support Year
03
Suffix
S2
Award Notice Date
9/10/2021 - 2 years ago

Organizations

University of Southern California

Information

Improving AI/ML-readiness of FaceBase Research Datasets

PROJECT SUMMARY The goal of the FaceBase III Hub was created by the National Institute for Dental and Craniofacial Research (NIDCR) to create a data repository to serve the entire community of dental and craniofacial researchers by sharing diverse data related to craniofacial development and dysmorphia, as well as other research communities that can leverage the diverse data that is in the FaceBase repository. One particularly unique and important element of FaceBase III is that it has over 22,000 facial images from over 11,000 human subjects, many of which are labeled with syndromes based on clinical and genomic diagnoses. Facial images are a critical resource for studying the correlation between genotype and phenotype and have received intense interest within the Artificial Intelligence (AI) and Machine Learning (ML) research field with notable advances in automated phenotyping. While FaceBase embraces the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, there are unique concerns specific to AI/ML research including: presence of noise, uncertainty of labels, and bias within datasets. It is imperative that we remedy any limitations in the utility of FaceBase?s facial imaging data for AI/ML research. In this project, we propose to unlock the tremendous potential of FaceBase facial scans by identifying gaps in how data is characterized, formated, and preprocessed from the perspective of its use in AI/ML research and algorithm development. To accomplish this, we propose to initiate a pilot application that applies existing deep learning algorithms developed by investigators in this proposal to existing FaseBase data (Aim 1). The goal of the pilot is to identify how curation, organization and preparation of FaceBase data might be improved so as to streamline their use in ML/AI based investigations. Based on what we learn from the pilot, we will modify the current FaceBase self curation processes specifically around Facial Scans (Aim 2). This will require us to streamline our process associated with curation of human subject data, so that we have the necessary rich descriptive elements while maintaining required restrictions on data handling. Ultimately, the goal is to position the FaceBase Hub so that the existing facial scan resources become more broadly useful to AI/ML researchers. More significantly, we expect to see an increased availability with facial scan data and other associated data types, such as genotyping and neurofunctional data. By making the proposed improvements to our data ingest procedures, we anticipate that this proposal will allow FaceBase to scale to significantly larger data set sizes, and consequently, cementing and expanding its position as a unique resource to the broader NIH community of ML and AI researchers.

IC Name

NATIONAL INSTITUTE OF DENTAL & CRANIOFACIAL RESEARCH

Activity
U01
Administering IC
DE
Application Type
3

Direct Cost Amount
266807
Indirect Cost Amount
70750
Total Cost
337557
Sub Project Total Cost

ARRA Funded
False
CFDA Code
121
Ed Inst. Type
BIOMED ENGR/COL ENGR/ENGR STA
Funding ICs
OD:337557\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
ZDE1
Study Section Name
Special Emphasis Panel

Organization Name
UNIVERSITY OF SOUTHERN CALIFORNIA
Organization Department
BIOSTATISTICS & OTHER MATH SCI
Organization DUNS
072933393
Organization City
Los Angeles
Organization State
CA
Organization Country
UNITED STATES
Organization Zip Code
900890701
Organization District
UNITED STATES

Improving AI/ML-readiness of FaceBase Research Datasets

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Improving AI/ML-readiness of FaceBase Research Datasets

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District