Collaborative Research: SHF: Small: Model-driven Design and Optimization of Dataflows for Scientific Applications

Information

  • NSF Award
  • 2331153
Owner
  • Award Id
    2331153
  • Award Effective Date
    10/1/2023 - 7 months ago
  • Award Expiration Date
    9/30/2025 - a year from now
  • Award Amount
    $ 200,000.00
  • Award Instrument
    Standard Grant

Collaborative Research: SHF: Small: Model-driven Design and Optimization of Dataflows for Scientific Applications

The increasing capability of high-performance computing (HPC), cloud computing, and edge computing systems directly translates into the ability to generate more data and execute more extended analyses, thus expanding the range of natural phenomena that scientists can study using dataflows in scientific domains such as chemistry, materials sciences, molecular biology, and drug design. At the same time, the steady growth in the complexity of these dataflows also results in new challenges in the effective composition of single data tasks into scalable dataflow pipelines. This project addresses these critical challenges by developing solutions to optimize dataflow pipelines across heterogeneous resources. This project builds a broader community of HPC experts, who will have a far-reaching impact on the efficient development of dataflow pipelines supporting scientific applications. The team of researchers promotes increased participation of underrepresented students, particularly women, through mentoring students in Systers (the organization for women in Electrical Engineering and Computer Science at the University of Tennessee Knoxville). Furthermore, the researchers develop data analytics training tailored for early career professionals and share the material with the Midwest Research Computing and Data Consortium and the attendees at the bi-annual NSF/TCPP (Technical Community on Parallel Processing) workshops on parallel and distributed computing education (EduPar).<br/> <br/>This project has four main research components. First, the project defines a taxonomy of common dataflow motifs used in scientific domains, ranging from simple producer-consumer pairs to complex pipelines with multiple producers and consumers, by mapping these motifs to real scientific applications. Second, the project designs a middleware layer to handle dataflow pipelines executing on HPC, cloud, and edge resources. Third, the project develops a 2-step model for mitigating pipelines that result in data loss and inefficiencies associated with the slowdown in data production or consumption in dataflow pipelines. Finally, the project trains a broader community to utilize the taxonomy, middleware, and model to optimize real scientific applications by identifying potential bottlenecks and making necessary adjustments to maximize pipeline efficiency and accuracy, continuously monitoring and optimizing pipelines to ensure the highest quality scientific output possible.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Almadena Chtchelkanovaachtchel@nsf.gov7032927498
  • Min Amd Letter Date
    8/25/2023 - 8 months ago
  • Max Amd Letter Date
    8/25/2023 - 8 months ago
  • ARRA Amount

Institutions

  • Name
    University of Southern California
  • City
    LOS ANGELES
  • State
    CA
  • Country
    United States
  • Address
    3720 S FLOWER ST
  • Postal Code
    900894304
  • Phone Number
    2137407762

Investigators

  • First Name
    Ewa
  • Last Name
    Deelman
  • Email Address
    deelman@isi.edu
  • Start Date
    8/25/2023 12:00:00 AM

Program Element

  • Text
    Software & Hardware Foundation
  • Code
    7798

Program Reference

  • Text
    SMALL PROJECT
  • Code
    7923
  • Text
    HIGH-PERFORMANCE COMPUTING
  • Code
    7942
  • Text
    WOMEN, MINORITY, DISABLED, NEC
  • Code
    9102