Data Science Core (DSC) Leads: Krishna Shenoy PhD and Chris Roat PhD (with Surya Ganguli PhD) Project Summary Given the large volumes of optical, electrical, genetic and behavioral data that will be generated, stored and computationally analyzed, it is essential to establish a comprehensive and yet streamlined DSC. There are four major data challenges that the DSC will address. (1) Data size. Each experimental lab will generate very large, and rapidly increasing, datasets. We must contend with storing, pre-processing (e.g., spike sorting) and processing (e.g., single-trial analyses) these large and growing datasets. (2) Metadata. Collaborations between groups are often hampered by not fully capturing ? in a searchable database and linked to the bulk data ? all animal and experiment conditions, or so-called metadata. We will build in capabilities and requirements to electronically capture full metadata. (3) Data format. Collaborations are also often hampered by the effort required to understand each lab?s dataset format. Data format often depends on whether a given measurement system was custom built or relies on a commercial system. We will capture this information as part of the metadata for historical data relevant to this U19, and moving forward we will adopt the increasingly-popular NeuroData Without Borders (NWB) data format. Finally, (4) Across animals and labs. Performing large- scale analyses across many animals and labs is often truly onerous. This is because all three of the challenges listed above combine, causing one to shy away from anything other than essential analyses (e.g., pooling results across just a few mice in one specific condition). We will both build our own data pipelines to automatically query our metadata database and, subsequently, retrieve the indicated experimental data as well as adopt the increasingly-popular DataJoint pipeline. Our DSC will be led by Prof. Shenoy, Dr. Roat (with considerable industrial-scale data handling experience, and now at Stanford) and Prof. Ganguli (RP3 lead). Two full-time software engineers (TBD) will implement the DSC architecture, including bulk data server, relational meta-database, data standards and data pipeline. The software engineers will work closely with the rest of the team to help assure good communication, and to help migrate analysis code and documentation into professional software standards for dissemination. This will enable storage, retrieval and analysis of data in an efficient and modular way, which enables rapid replacement of any piece of the data analysis pipeline as is essential for a creative environment that also promotes rapid feedback of emerging ideas to subsequent experiments. We believe in Open Science, including open source code (e.g., github) and data formats. We will share data with the broader community, including with other U19 consortia. Thus our DSC is critical to the success of our proposed research, and serves as the central hub of our U19 research.