CSR: Small: Accelerating Data Intensive Scientific Workflows with Consistency Contracts

Information

  • NSF Award
  • 2317556
Owner
  • Award Id
    2317556
  • Award Effective Date
    10/1/2023 - 8 months ago
  • Award Expiration Date
    9/30/2026 - 2 years from now
  • Award Amount
    $ 599,687.00
  • Award Instrument
    Standard Grant

CSR: Small: Accelerating Data Intensive Scientific Workflows with Consistency Contracts

Advanced discovery in scientific computing increasingly depends upon the successful execution of complex workflows that combine multiple applications together to run in concert on a large high performance cluster. A widespread challenge in this setting is the performance of the shared parallel filesystem. Because each file system interaction has very different needs in terms of performance and consistency, the filesystem is obliged to follow the most conservative approach to handle the worst case. As a result, peak performance is rarely achieved. We propose "consistency contracts" as the solution to this problem. This novel approach requires the workflow as a whole to declare its intended uses of the file system at the start of each execution, allowing the runtime system to perform a variety of optimizations. This project will evaluate the concept of consistency contracts by constructing an experimental system (Pledge) that enables and enforces contracts on existing data intensive workflows, with minimal disruption to current practice. We hypothesize that this approach will significantly improve performance for data intensive scientific applications running on high performance clusters,and has the potential to be more widely applied. <br/><br/>Our focus on consistency results from our observation that today's shared parallel filesystem is asked to fill multiple roles: moving large files, delivering complex software trees, providing buffers between tasks, and providing synchronization between tasks. Current filesystems provide the most conservative sequential consistency to handle the worst case. Rather than depend upon the shared filesystem to perform last-minute runtime arbitration of every individual filesystem operation, we argue that the workflow as a whole should declare its intentions for the duration of the execution, indicating the paths, access modes, and consistency requirements needed for the entire workflow run. With a contract in hand, the runtime system can then perform a variety of optimizations that exploit the internal storage and I/O capacity of the cluster as a whole, for example utilizing a streamlined approach for read-only access. We hypothesize that workflow-level consistency management will yield higher effective I/O bandwidth and transaction rates than strict global consistency management for data intensive scientific applications running on high performance clusters. These improved I/O rates will translate into faster end-to-end runtimes and fewer unexpected performance failures for end users and system administrators.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Karen Karavanickkaravan@nsf.gov7032922594
  • Min Amd Letter Date
    8/28/2023 - 9 months ago
  • Max Amd Letter Date
    8/28/2023 - 9 months ago
  • ARRA Amount

Institutions

  • Name
    University of Notre Dame
  • City
    NOTRE DAME
  • State
    IN
  • Country
    United States
  • Address
    940 Grace Hall
  • Postal Code
    465565708
  • Phone Number
    5746317432

Investigators

  • First Name
    Douglas
  • Last Name
    Thain
  • Email Address
    dthain@nd.edu
  • Start Date
    8/28/2023 12:00:00 AM

Program Element

  • Text
    CSR-Computer Systems Research
  • Code
    7354

Program Reference

  • Text
    SMALL PROJECT
  • Code
    7923