Data is a major driver of scientific and technological progress. However, large stocks of critical data reside solely in physical archives owned and maintained by a diversity of stakeholders. These stakeholders have access to vastly different resources, which constrains their ability to make these data accessible to themselves and others, to infer their meaning, and to compare and combine it with external information. The built physically / studied digitally (BP/SD) consortium facilitates such efforts by further developing interoperable computational tools that enable organizations to extract meaning from large sets of physical documents and disseminate them in a findable, accessible, interoperable, and reusable manner. The BP/SD Consortium addresses the challenges to making this goal a reality through four thrusts addressing software development and deployment, data acquisition and dissemination, outreach and community building, and education and training. The initial focus of the BP/SD computational platform enables scholars to answer critical questions – and to answer them faster – in fields of research as varied as seismology, legal studies, and science policy and the history of science. These three focus areas are ones in which significant systemic inequities remain. By reducing the costs of digitization and annotation, the BP/SD platform helps bring visibility to archives centering on ‘invisible’ individuals and groups, and to enable underfunded initiatives to improve access to historical and current materials. The newly digitized data brought online provides a fertile resource for entrepreneurs by increasing competition and driving innovation in areas poised for the application of AI/ML approaches that currently lack robust, accessible data. <br/><br/>The project's mission is organized around two components: the Platform and the Consortium. The Platform component will address data acquisition and software development. The Consortium component will address (i) software deployment, (ii) data dissemination, (iii) education and training, and (iv) outreach and community building. The integration of the BP/SD platform aims to: (1) Implement a persistent identifier (PID) system: engage with DataCite to assign and organize PIDs for the artifacts processed and the metadata generated. Note that for seismologic records, seismic networks of sensors are assigned DOIs, not the records themselves. To unambiguously ascertain data provenance and increase reproducibility, the project will assign PIDs to individual digitized records. (2) Develop and deploy an artifact classification pipeline: this will be guided by the principles of the Dublin Core Metadata Initiative. (3) Develop and deploy a data and descriptive metadata extraction pipeline. (4) Gather expert input and help develop a consensus concerning the privacy principles that will be implemented in enriching the metadata associated with a given corpus. (5) Develop and deploy a named entity extraction and disambiguation pipeline: this will be guided in the organization of named entities by ontologies such as friends of a friend (FOAF) and pay particular attention to the privacy implications arising from named entity recognition and disambiguation pipelines. (6) Develop and deploy a knowledge graph creation pipeline. (7) Implement an authentication system for managing user access to the platform. To target the expansion of the universe of users engaging with the BP/SD platform, the consortium will: (1) Develop an undergraduate course and summer internships that trains students in the use of the BP/SD platform and pursue this development in collaboration with the Metropolitan Chicago Data Science Corps (MCDC) consortium. (2) Connect groups of undergraduate and graduate students—with a special focus on under-represented demographic groups— with institutions wanting to digitize their archives via MCDC-run practica, EarthScope activities, or Computational infrastructure in Geodynamics (CIG) training programs. (3) Develop modular, adaptable, online disseminated teaching materials, and hold workshops for training faculty in their delivery in collaboration with MCDC. (4) Develop materials and hold workshops that make archive holders aware of the BP/SD consortium and platform, with the goal of including archive holders beyond the initial group of partners in the consortium. <br/><br/>This award by the Office of Advanced Cyberinfrastructure is jointly supported by the National Science Foundation's Public Access Initiative.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.