Data Science Core

Information

Research Project
10047728

ApplicationId
10047728
Core Project Number
U19NS118284
Full Project Number
1U19NS118284-01
Serial Number
118284
FOA Number
RFA-NS-19-003
Sub Project Id
8070

Project Start Date
9/17/2021 - 4 years ago
Project End Date
8/31/2026 - 5 months from now
Program Officer Name
Budget Start Date
6/1/2020 - 5 years ago
Budget End Date
5/31/2021 - 4 years ago
Fiscal Year
2021
Support Year
01
Suffix
Award Notice Date
9/17/2021 - 4 years ago

Organizations

Stanford University

Information

Data Science Core

Data Science Core (DSC) Leads: Krishna Shenoy PhD and Chris Roat PhD (with Surya Ganguli PhD) Project Summary Given the large volumes of optical, electrical, genetic and behavioral data that will be generated, stored and computationally analyzed, it is essential to establish a comprehensive and yet streamlined DSC. There are four major data challenges that the DSC will address. (1) Data size. Each experimental lab will generate very large, and rapidly increasing, datasets. We must contend with storing, pre-processing (e.g., spike sorting) and processing (e.g., single-trial analyses) these large and growing datasets. (2) Metadata. Collaborations between groups are often hampered by not fully capturing ? in a searchable database and linked to the bulk data ? all animal and experiment conditions, or so-called metadata. We will build in capabilities and requirements to electronically capture full metadata. (3) Data format. Collaborations are also often hampered by the effort required to understand each lab?s dataset format. Data format often depends on whether a given measurement system was custom built or relies on a commercial system. We will capture this information as part of the metadata for historical data relevant to this U19, and moving forward we will adopt the increasingly-popular NeuroData Without Borders (NWB) data format. Finally, (4) Across animals and labs. Performing large- scale analyses across many animals and labs is often truly onerous. This is because all three of the challenges listed above combine, causing one to shy away from anything other than essential analyses (e.g., pooling results across just a few mice in one specific condition). We will both build our own data pipelines to automatically query our metadata database and, subsequently, retrieve the indicated experimental data as well as adopt the increasingly-popular DataJoint pipeline. Our DSC will be led by Prof. Shenoy, Dr. Roat (with considerable industrial-scale data handling experience, and now at Stanford) and Prof. Ganguli (RP3 lead). Two full-time software engineers (TBD) will implement the DSC architecture, including bulk data server, relational meta-database, data standards and data pipeline. The software engineers will work closely with the rest of the team to help assure good communication, and to help migrate analysis code and documentation into professional software standards for dissemination. This will enable storage, retrieval and analysis of data in an efficient and modular way, which enables rapid replacement of any piece of the data analysis pipeline as is essential for a creative environment that also promotes rapid feedback of emerging ideas to subsequent experiments. We believe in Open Science, including open source code (e.g., github) and data formats. We will share data with the broader community, including with other U19 consortia. Thus our DSC is critical to the success of our proposed research, and serves as the central hub of our U19 research.

IC Name

NATIONAL INSTITUTE OF NEUROLOGICAL DISORDERS AND STROKE

Activity
U19
Administering IC
NS
Application Type
1

Direct Cost Amount
275233
Indirect Cost Amount
157984
Total Cost
Sub Project Total Cost
433217

ARRA Funded
False
CFDA Code
Ed Inst. Type
Funding ICs
NINDS:433217\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
ZNS1
Study Section Name
Special Emphasis Panel

Organization Name
STANFORD UNIVERSITY
Organization Department
Organization DUNS
009214214
Organization City
STANFORD
Organization State
CA
Organization Country
UNITED STATES
Organization Zip Code
943052004
Organization District
UNITED STATES

Data Science Core

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Data Science Core

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District