Elements: Adaptive End-to-End Parallelism for Distributed Science Workflows

Information

  • NSF Award
  • 2427408
Owner
  • Award Id
    2427408
  • Award Effective Date
    4/1/2024 - 2 months ago
  • Award Expiration Date
    10/31/2025 - a year from now
  • Award Amount
    $ 376,088.00
  • Award Instrument
    Standard Grant

Elements: Adaptive End-to-End Parallelism for Distributed Science Workflows

Technological advancements in sensing and computing technologies have led to an unprecedented increase in the amount of data generated by scientific applications. As science projects are increasingly distributed in nature, the increase in data sizes in turn results in an increased volume of traffic that needs to be moved across geographically distributed locations. Although significant investments have been made to build high-speed networks to facilitate data movements between research and education institutions, it is difficult for domain scientists to efficiently utilize this available capacity mainly due to the lack of scalable data transfer services. This project addresses this need by developing a scalable and reliable data transfer service. It further integrates the data transfer service into elastic workflow management systems to achieve end-to-end optimization for distributed science workflows.<br/> <br/>This project makes three novel contributions to the field: (i) it innovates scalable integrity verification and encryption for file transfers to ensure the reliability of file transfers without sacrificing performance. It takes advantage of computing resources available at data transfer nodes to scale the performance of integrity verification and channel encryption features. (ii) It innovates end-to-end parallelism for distributed workflows by integrating an online transfer optimization service into elastic workflow management tools. Unlike existing workflow management solutions, which merely focus on the optimization of computing tasks, the proposed integration of online transfer optimization services into elastic workflow schedulers enables true end-to-end parallelism for distributed workflows. (iii) Finally, it demonstrates the performance of the developed service on a real-world bioscience workflow that streams a large volume of sequence read archive data from the NCBI database to extract computation-ready SAM/BAM files.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Ashok Srinivasanasriniva@nsf.gov7032922122
  • Min Amd Letter Date
    4/15/2024 - a month ago
  • Max Amd Letter Date
    4/15/2024 - a month ago
  • ARRA Amount

Institutions

  • Name
    University of Texas at Arlington
  • City
    ARLINGTON
  • State
    TX
  • Country
    United States
  • Address
    701 S NEDDERMAN DR
  • Postal Code
    760199800
  • Phone Number
    8172722105

Investigators

  • First Name
    Engin
  • Last Name
    Arslan
  • Email Address
    engin.arslan@uta.edu
  • Start Date
    4/15/2024 12:00:00 AM

Program Element

  • Text
    Software Institutes
  • Code
    800400

Program Reference

  • Text
    CSSI-1: Cyberinfr for Sustained Scientif
  • Text
    Software Institutes
  • Code
    8004
  • Text
    EXP PROG TO STIM COMP RES
  • Code
    9150
  • Text
    REU SUPP-Res Exp for Ugrd Supp
  • Code
    9251