Principal Investigator/Program Director: John Kwagyan, PhD Project Summary Advances in technologies and computational tools has made it possible to generate and store large, complex, and diverse datasets. The behavioral, biomedical and health research enterprise is increasingly becoming data- intensive and data-driven. As a result, recognizing, understanding, and using big data for behavioral, biomedical, and health related research has become necessary to arrive at best evidence and enhance translation. Applied data science starts with identifying relevant data sources, developing algorithms and/or utilization of existing software tools to access these data and employing advanced data analytical skills for knowledge discovery and dissemination of information. In biomedical and healthcare areas, big data methodology is used in several fields including but not limited to medical imaging studies, drug discovery, genomics, predictive diagnosis and cost effectiveness studies. Data science also allows researchers to assess patient/population heterogeneity, through the integration of large data from published literature and meta-analysis to reach conclusions that can be used to inform clinical practice and guide public health policy. Despite the importance of the emerging field of data science and the recent rapid rise of its use, many minority institutions do not offer specialized training in data science. Moreover, though the approach is highly developed, there are few who possess the necessary skill set in this highly quantitative and technical field and training opportunities are uncommon and there exist disparities in data science expertise. Skill sets in data science, from ethical practices in data collection, use of complex computational and analytic techniques including machine learning, to data visualization and reporting, are particularly critical for advancing the science of minority health and health disparities. To bridge this gap, and address disparities in acquiring data science skills and its applications, we created the Howard University, HU RCMI Virtual Applied Data Science Training Institute, VADSTI. VADSTI, drew faculty with complementary experts in the conduct and application of data science from across different institutions and in partnership with the NIH Office of Data Science Strategy, delivered a well-attended and successful 8-week comprehensive data science training. The ability of behavioral, biomedical and clinical researchers to recognize, and use big data is still limited in minority serving institutions, for various reasons including lack of exposure to relevant databases and knowledge of programming techniques and access to relevant software, tools, and expertise in data analytics. The need for creation of an ecosystem of data science resources and useful tools at minority serving institutions is warranted. The goals of the current application are, (i) to attract, train and engage the next generation of investigators in using advanced analytics and data science methodology with application to large existing minority health and health disparities datasets, (ii) continue to enhance capacity building by promoting and simulating the creation of data science ecosystem at HU. PHS 398 (Rev. 01/18 Approved Through 03/31/2020) Page 1 OMB No. 0925-0001