OAC Core: Cost-Adaptive Monitoring and Real-Time Tuning at Function-Level

Information

  • NSF Award
  • 2402542
Owner
  • Award Id
    2402542
  • Award Effective Date
    8/1/2024 - a month from now
  • Award Expiration Date
    7/31/2026 - 2 years from now
  • Award Amount
    $ 426,459.00
  • Award Instrument
    Standard Grant

OAC Core: Cost-Adaptive Monitoring and Real-Time Tuning at Function-Level

This project aims to address the challenge of performance monitoring on supercomputers by developing a tool that provides function-level insights with minimal overhead, enabling real-time tuning of applications. The initiative addresses the gap in understanding computational practices within diverse scientific domains, thus aiding in informed decision-making for system design and numerical library optimization. This advancement promises to enhance the efficiency of existing supercomputing infrastructures and contributes to the NSF's mission by supporting scientific progress and educational diversity, ultimately catalyzing a broader spectrum of scientific breakthroughs.<br/><br/>This project is designed to improve performance monitoring within high-performance computing. It aims to address the increasing complexity and diversity of applications spanning scientific research, engineering, big data, and artificial intelligence. The approach involves implementing function-level monitoring through dynamic binary instrumentation and managing the monitoring overhead with a heartbeat mechanism. Additionally, it integrates real-time tuning capabilities for optimizing numerical libraries at runtime. This endeavor seeks to enhance traditional job-level resource utilization monitoring tools significantly. The research will identify standard function calls, evaluate the instrumentation overhead, and develop and validate policies for controlling overhead and accuracy. It will also involve creating a performance benchmark for assessing real-time tuning. The intellectual merit of this project stems from its potential to provide a novel tool that offers a more precise resolution of application behaviors and enables real-time performance tuning. By introducing adaptive monitoring and real-time tuning at the function level for large computational platforms, this project aims to accelerate scientific progress. Furthermore, it promotes diversity and inclusivity by actively involving underrepresented minority groups, contributing to a more diverse and skilled workforce in high-performance computing.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Juan Lijjli@nsf.gov7032922625
  • Min Amd Letter Date
    4/10/2024 - a month ago
  • Max Amd Letter Date
    4/10/2024 - a month ago
  • ARRA Amount

Institutions

  • Name
    University of Texas at Austin
  • City
    AUSTIN
  • State
    TX
  • Country
    United States
  • Address
    110 INNER CAMPUS DR
  • Postal Code
    787121139
  • Phone Number
    5124716424

Investigators

  • First Name
    Junjie
  • Last Name
    Li
  • Email Address
    jli@tacc.utexas.edu
  • Start Date
    4/10/2024 12:00:00 AM
  • First Name
    Yinzhi
  • Last Name
    Wang
  • Email Address
    iwang@tacc.utexas.edu
  • Start Date
    4/10/2024 12:00:00 AM

Program Element

  • Text
    OAC-Advanced Cyberinfrast Core

Program Reference

  • Text
    SMALL PROJECT
  • Code
    7923