PROJECT SUMMARY The goal of the FaceBase III Hub was created by the National Institute for Dental and Craniofacial Research (NIDCR) to create a data repository to serve the entire community of dental and craniofacial researchers by sharing diverse data related to craniofacial development and dysmorphia, as well as other research communities that can leverage the diverse data that is in the FaceBase repository. One particularly unique and important element of FaceBase III is that it has over 22,000 facial images from over 11,000 human subjects, many of which are labeled with syndromes based on clinical and genomic diagnoses. Facial images are a critical resource for studying the correlation between genotype and phenotype and have received intense interest within the Artificial Intelligence (AI) and Machine Learning (ML) research field with notable advances in automated phenotyping. While FaceBase embraces the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, there are unique concerns specific to AI/ML research including: presence of noise, uncertainty of labels, and bias within datasets. It is imperative that we remedy any limitations in the utility of FaceBase?s facial imaging data for AI/ML research. In this project, we propose to unlock the tremendous potential of FaceBase facial scans by identifying gaps in how data is characterized, formated, and preprocessed from the perspective of its use in AI/ML research and algorithm development. To accomplish this, we propose to initiate a pilot application that applies existing deep learning algorithms developed by investigators in this proposal to existing FaseBase data (Aim 1). The goal of the pilot is to identify how curation, organization and preparation of FaceBase data might be improved so as to streamline their use in ML/AI based investigations. Based on what we learn from the pilot, we will modify the current FaceBase self curation processes specifically around Facial Scans (Aim 2). This will require us to streamline our process associated with curation of human subject data, so that we have the necessary rich descriptive elements while maintaining required restrictions on data handling. Ultimately, the goal is to position the FaceBase Hub so that the existing facial scan resources become more broadly useful to AI/ML researchers. More significantly, we expect to see an increased availability with facial scan data and other associated data types, such as genotyping and neurofunctional data. By making the proposed improvements to our data ingest procedures, we anticipate that this proposal will allow FaceBase to scale to significantly larger data set sizes, and consequently, cementing and expanding its position as a unique resource to the broader NIH community of ML and AI researchers.