The NSF Convergence Accelerator supports use-inspired, team-based, multidisciplinary efforts that address challenges of national importance and will produce deliverables of value to society in the near future. <br/><br/>This project, NSF Convergence Accelerator-Track D: Application of sequential inductive transfer learning for experimental metadata normalization to enable rapid integrative analysis, develops tools to support integrative analyses and meta analyses across multiple, distinct research databases. With the explosion in data-driven research, researchers even in a single domain of study are confronted with many different databases that use different terminologies and measurement schemes. Thus, data become siloed—collected via different processes and described by different metadata schemes with no central index of databases, metadata, or variables, making it difficult for a researcher to identify data of the appropriate type for use in integrative analyses or meta analyses. In Phase I of this effort, a multidisciplinary team of researchers and experts in statistics, epidemiology, data harmonization, machine leaning, ethics, databases, imaging, and software engineering will develop tools to link metadata across four biomedical database, as a proof of concept. The linked information will be available via the MetaMatchMaker (3M) portal to be developed by the project.<br/><br/>While traditional neural network approaches could be used to link experimental metadata, that approach can be time consuming, requiring the construction of large training datasets. This project employs an alternative approach based on Pretrained Learning Models (PLMs), combining methods used in Natural Language Processing (NLP) and transfer learning, to allow for the application of data-driven models built in one domain to be applied to another, without the time and expense of developing large training datasets. In Phase I of the effort, a PLM will be developed from a large existing manually trained dataset of PhenX–dbGAP metadata linkage, which will then be used to link metadata from four diverse biomedical databases. The results from Phase I would enable rapid and broader identification of experimental data in less time and with fewer resources devoted to data normalization; second, the PLM approach is expected to provide significant savings in linking experimental metadata across databases by eliminating, or greatly reducing, the need for development of training data. Phase II of this effort will expand the number and variety of linked databases, and also make 3M compliant with developing federated data access procedures for biomedical data, such as Global Alliance for Genomics and Health (GA4GH)’s Authentication and Authorization Infrastructure. The metrics for success of this approach include increased speed and reduced cost of conducting integrative analyses; increased reuse of linked data. While, the proof of concept in Phase I is based on the linkage of biomedical data, if successful, this approach would be applicable to databases frp, many other domains including, for example, national security, weather, environmental research, geosciences, astronomy, forensic analysis, and law enforcement.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.