In the multi-billion-dollar storage industry, efficient operation of systems is essential for achieving application accuracy, reliability, and performance. Traditionally, this efficiency has relied on heuristics with adjustable parameters. However, as workloads and devices become increasingly complex, manual tuning becomes impractical. The DISCO project (which stands for “disciplined data science framework for storage I/O management”) will address how to systematically leverage data science (DS) to revolutionize the many facets of storage I/O decision making. More specifically, DISCO’s research objectives are to (a) pioneer a comprehensive data science pipeline tailored to enhance the storage I/O decision-making process by in-depth exploration of intricate concepts such as data augmentation, precise labeling, noise filtration, meticulous model engineering, drift detection, and many others; (b) target both classical I/O policies (e.g., I/O admission, prefetching) and open problems in the context of modern device features (multi-stream and KV-SSDs) as well as venture to “uncharted territories" such as investigating what data science can reveal from billions of performance data points; and (c) comprehensively encompass high-, medium-, and low-frequency decision making and address each of their own unique challenges, but at the same time address cross-cutting concerns such as all-in-one integration. <br/><br/>The DISCO project will bring significant broader impacts, especially in training future storage data scientists. The Data Storage Research Vision 2025 (DSRV) paper from an NSF workshop emphasized "the deficit of the professionals who are knowledgeable in both storage and AI areas" where "the number of fresh graduate students with this combination of skills is small, and training existing staff takes time and effort" and "storage companies are also experiencing significant competition from other industries that require AI/ML knowledge." In this context, the DISCO project will train graduate and undergraduate students to be part of the next-generation storage data scientists. The project will also release open ML-for-storage testbeds along with a public storage data science curriculum. In terms of technology transfer, the DSRV workshop paper also states that “storage companies are excited by the opportunities of using ML to improve performance and reliability, and develop quality products.” The DISCO project will produce sophisticated ML-for-storage solutions for solid-state drive (SSD) systems, potentially making a positive impact to the SSD market that is forecasted to reach over $50 billion by 2025.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.