EAGER: Exploring Automatic Optimization of Multi-tiered HPC Storage Systems via Practical Reinforcement Learning

Information

NSF Award
2412345

Owner

University of North Carolina at Charlotte

Award Id
2412345
Award Effective Date
7/1/2024 - a year ago
Award Expiration Date
6/30/2025 - 16 days ago
Award Amount
$ 133,980.00
Award Instrument
Standard Grant

Information

EAGER: Exploring Automatic Optimization of Multi-tiered HPC Storage Systems via Practical Reinforcement Learning

Nowadays, scientific discovery increasingly involves generating and analyzing large amounts of data. These data-intensive scientific applications pose significant challenges to the storage systems of high-performance computing (HPC) clusters, that are heterogeneous and extremely complex. Scientists who need high-speed data access often experience frustration in effectively using these heterogeneous storage options. There is need to build the long-missing automated HPC I/O (Input/Output) middleware to transparently help scientists achieve optimal data access performance without their manual efforts. Designing automated HPC I/O middleware for large-scale, heterogeneous, and shared HPC storage systems is an extremely challenging task. The researchers supported by this grant plan to leverage machine learning techniques to understand the requests and the current system status, intelligently and adaptively scheduling and coordinating I/O requests. The outcomes of this research are expected to work with existing storage components and minimize the impacts on both scientific applications and the HPC systems.<br/><br/>This project plans to tackle this grand challenge by exploring practical reinforcement learning-based (RL) methods and building relevant software infrastructure in an HPC environment. There are two main focuses in the project: 1) RL-based data placement for high storage utilization, and 2) RL-based I/O coordination for shared storage. Both tasks depend on identifying effective reinforcement learning methods and integrating these methods effectively into HPC systems. To achieve this goal, a novel, system-centric reinforcement learning framework will be developed. Moreover, in each research focus, various RL algorithms, deep neural network designs, and reward shaping will be proposed, implemented, rigorously benchmarked, and compared with state-of-the-art solutions.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Almadena Chtchelkanovaachtchel@nsf.gov7032927498
Min Amd Letter Date
2/9/2024 - a year ago
Max Amd Letter Date
2/9/2024 - a year ago
ARRA Amount

Institutions

Name
University of North Carolina at Charlotte
City
CHARLOTTE
State
NC
Country
United States
Address
9201 UNIVERSITY CITY BLVD
Postal Code
282230001
Phone Number
7046871888

Investigators

First Name
Dong
Last Name
Dai
Email Address
dai@udel.edu
Start Date
2/9/2024 12:00:00 AM

Program Element

Text
Software & Hardware Foundation
Code
779800

Program Reference

Text
EAGER
Code
7916

Text
HIGH-PERFORMANCE COMPUTING
Code
7942

EAGER: Exploring Automatic Optimization of Multi-tiered HPC Storage Systems via Practical Reinforcement Learning

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

EAGER: Exploring Automatic Optimization of Multi-tiered HPC Storage Systems via Practical Reinforcement Learning

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code

Text

Code