Collaborative Research: CSR: Medium: DISCO: Disciplined Data Science Framework for Storage I/O Management

Information

NSF Award
2402328

Owner

FLORIDA INTERNATIONAL UNIVERSITY

Award Id
2402328
Award Effective Date
10/1/2024 - 4 months ago
Award Expiration Date
9/30/2028 - 3 years from now
Award Amount
$ 312,066.00
Award Instrument
Continuing Grant

Information

Collaborative Research: CSR: Medium: DISCO: Disciplined Data Science Framework for Storage I/O Management

In the multi-billion-dollar storage industry, efficient operation of systems is essential for achieving application accuracy, reliability, and performance. Traditionally, this efficiency has relied on heuristics with adjustable parameters. However, as workloads and devices become increasingly complex, manual tuning becomes impractical. The DISCO project (which stands for “disciplined data science framework for storage I/O management”) will address how to systematically leverage data science (DS) to revolutionize the many facets of storage I/O decision making. More specifically, DISCO’s research objectives are to (a) pioneer a comprehensive data science pipeline tailored to enhance the storage I/O decision-making process by in-depth exploration of intricate concepts such as data augmentation, precise labeling, noise filtration, meticulous model engineering, drift detection, and many others; (b) target both classical I/O policies (e.g., I/O admission, prefetching) and open problems in the context of modern device features (multi-stream and KV-SSDs) as well as venture to “uncharted territories" such as investigating what data science can reveal from billions of performance data points; and (c) comprehensively encompass high-, medium-, and low-frequency decision making and address each of their own unique challenges, but at the same time address cross-cutting concerns such as all-in-one integration. <br/><br/>The DISCO project will bring significant broader impacts, especially in training future storage data scientists. The Data Storage Research Vision 2025 (DSRV) paper from an NSF workshop emphasized "the deficit of the professionals who are knowledgeable in both storage and AI areas" where "the number of fresh graduate students with this combination of skills is small, and training existing staff takes time and effort" and "storage companies are also experiencing significant competition from other industries that require AI/ML knowledge." In this context, the DISCO project will train graduate and undergraduate students to be part of the next-generation storage data scientists. The project will also release open ML-for-storage testbeds along with a public storage data science curriculum. In terms of technology transfer, the DSRV workshop paper also states that “storage companies are excited by the opportunities of using ML to improve performance and reliability, and develop quality products.” The DISCO project will produce sophisticated ML-for-storage solutions for solid-state drive (SSD) systems, potentially making a positive impact to the SSD market that is forecasted to reach over $50 billion by 2025.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Erik Brunvandebrunvan@nsf.gov7032928950
Min Amd Letter Date
7/11/2024 - 7 months ago
Max Amd Letter Date
7/11/2024 - 7 months ago
ARRA Amount

Institutions

Name
Florida International University
City
MIAMI
State
FL
Country
United States
Address
11200 SW 8TH ST
Postal Code
331992516
Phone Number
3053482494

Investigators

First Name
Janki
Last Name
Bhimani
Email Address
janki.bhimani@fiu.edu
Start Date
7/11/2024 12:00:00 AM

Program Element

Text
CSR-Computer Systems Research
Code
735400

Program Reference

Text
MEDIUM PROJECT
Code
7924

Collaborative Research: CSR: Medium: DISCO: Disciplined Data Science Framework for Storage I/O Management

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: CSR: Medium: DISCO: Disciplined Data Science Framework for Storage I/O Management

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code