arXiv is an open-access repository that has played a leading role in disciplines such as computer science, mathematics and physics for over 30 years. It hosts more than 2 million scientific papers and has a large user community. Each month there are approximately 5 million active users and 100 million web accesses. Despite its size and usage, arXiv has very limited search and recommendation functionality. In order to better serve the arXiv community, this project is building a new generation of search and recommendation functionality and simultaneously creating a research sandbox to reduce reliance on third-party, commercial services. To make arXiv's trove of scientific content accessible to the visually impaired, support is being added for well-structured HTML as well as PDF. Improved discovery of research results provides broad multidisciplinary benefits across areas of science. These include less researcher time wasted browsing through large amounts of irrelevant papers, revelation of "unknown unknowns," and accelerating research across different subject areas through unexpected synergies. Improved recommendation tools, which can provide unbiased and diverse sources of relevant research results and techniques, are urgently needed to break silos. arXiv will provide improved mechanisms for scientists to find out about important advances, both in their own field of expertise and in adjacent fields.<br/><br/>This project includes 4 major focus areas: Open A/B Testing, Neural Representations of Scientific Text, arXiv Dynamics, and Security & Privacy. (1) Open A/B Testing enables arXiv to become a platform for A/B testing of search and recommendation algorithms. In addition to online A/B testing, offline A/B testing is provided using historical data along with counterfactual estimators for policy rewards. (2) Neural Representation of Scientific Text provides a vector-based representation of scientific texts (documents, paragraphs, and sentences) appropriate for multiple tasks, including citation, author, title, and keyword prediction. Differentiable search indices are investigated due to their potential to provide additional search performance improvements without requiring incremental re-training. Finally, this supports the construction of a scientific question-answering system which can also be used as a context-sensitive "chat-bot" enabling researchers to converse with and get a list of recent publications relevant to their interests. (3) The arXiv Dynamics project investigates how scientific fields grow, shrink, and transform over time. Creating a "trending and emerging arXiv topics" pattern recognition system predicts how interesting current and historical articles are to researchers. Research is investigating methods to remove the "rich-get-richer" effect from this model, to correct the model for the effects of the users' historical interactions with the system, and to track performance and solicit user feedback as these models change over time. (4) Under Security & Privacy arXiv's privacy policy is updated so that users are aware of how their (meta-)data may be used and the protections that will be deployed to protect their privacy. A "Layer 1" API allows researchers to make coarse-grained queries on anonymized arXiv weblogs and a "Layer 2" API which allows researchers to securely experiment on arXiv metadata and weblogs. Privacy is preserved by a combination of query restrictions and researcher usage agreements. A machine-learning API layer is being developed which supports differential privacy, and allows researchers to investigate the utility of these tools for novel ML-based applications, such as free-form question answering about scientific texts, neural recommender systems, etc.<br/><br/>This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Information and Intelligent Systems in the Directorate for Computer and Information Science and Engineering and the Division of Physics within the Directorate for Mathematical and Physical Sciences.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.