SHF: Small: Memory Hiearachy Optimizations Meet Transformers (MITTEN)

Information

  • NSF Award
  • 2428108
Owner
  • Award Id
    2428108
  • Award Effective Date
    10/1/2024 - 3 months ago
  • Award Expiration Date
    9/30/2027 - 2 years from now
  • Award Amount
    $ 600,000.00
  • Award Instrument
    Standard Grant

SHF: Small: Memory Hiearachy Optimizations Meet Transformers (MITTEN)

Today, there is an increasing need for running powerful Artificial Intelligence (AI) models on mobile phones. Many of the latest generation of AI models (including ChatGPT and Gemini) follow what is known as the transformer architecture. Similar to what has been seen in optimizing various workloads on different computing devices, a class of optimizations related to memory hierarchy is extremely important for the efficient execution of transformer-based models on modern mobile devices. This project is based on the premise that features of these workloads and characteristics of mobile devices require not only the application of existing techniques from compiler literature but also the development of new methods. The project’s novelties are in considering such workload and architecture combination and proposing techniques related to choosing new layouts, removing redundant layout changes that slow down execution, performing memory allocation judiciously to improve performance, and dealing with the newest accelerators. The project’s impacts are helping bring the latest advances in Artificial Intelligence (AI) on mobile and edge devices, letting these advances reach more individuals, and contributing to compiler and runtime support literature by developing new methods. <br/><br/>In targeting memory hierarchy-related optimizations for transformers, we observe that compared to the previous generation of deep learning-based models, transformers have more data flow splits, shuffles, merges, and transpose/reshape(-like) operations. Thus, various compilation systems targeting deep learning developed in the past decade fall short with respect to memory-related transformations, especially with a global view of the problem. This project builds on top of the investigators’ recent work developing a comprehensive framework for removing relayout operations and delivering significantly better performance for transformer models. Building on this work, the following agenda is being undertaken: Performance (Cost) Models -- a detailed performance model for execution on mobile GPUs is being developed, which will especially be novel in capturing the locality behavior of a 2.5D cache; Formal Approaches to Transformations – more formal approaches for the same set of optimizations (e.g., replacing a relayout operator) are undertaken, including both polyhedral formulation and computation-data graph-based approaches; Layout Transformation in View of New Instructions – as newer processors are increasingly offering matrix (or tensor)-based instructions, which have their own specific data layout requirements, memory performance-related problems in view of these requirements are undertaken; and Memory Management for Dynamic Models --focusing on emerging dynamic models, computation ordering, memory allocation, and memory fragmentation problems are investigated. The investigators are working towards creating more synergy between compiler research (especially memory/cache modeling and tuning) and ML-model development communities. The research on large-scale Machine Learning and Deep Learning transformation/implementation techniques is to be incorporated into courses taught by the investigators.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Anindya Banerjeeabanerje@nsf.gov7032927885
  • Min Amd Letter Date
    7/22/2024 - 6 months ago
  • Max Amd Letter Date
    7/22/2024 - 6 months ago
  • ARRA Amount

Institutions

  • Name
    University of Georgia Research Foundation Inc
  • City
    ATHENS
  • State
    GA
  • Country
    United States
  • Address
    310 E CAMPUS RD RM 409
  • Postal Code
    306021589
  • Phone Number
    7065425939

Investigators

  • First Name
    Wei
  • Last Name
    Niu
  • Email Address
    wniu@uga.edu
  • Start Date
    7/22/2024 12:00:00 AM
  • First Name
    Gagan
  • Last Name
    Agrawal
  • Email Address
    gagrawal@uga.edu
  • Start Date
    7/22/2024 12:00:00 AM

Program Element

  • Text
    Software & Hardware Foundation
  • Code
    779800

Program Reference

  • Text
    SMALL PROJECT
  • Code
    7923
  • Text
    PROGRAMMING LANGUAGES
  • Code
    7943