SHF: Small: Memory Hiearachy Optimizations Meet Transformers (MITTEN)

Information

NSF Award
2428108

Owner

University of Georgia Research Foundation Inc

Award Id
2428108
Award Effective Date
10/1/2024 - 3 months ago
Award Expiration Date
9/30/2027 - 2 years from now
Award Amount
$ 600,000.00
Award Instrument
Standard Grant

Information

SHF: Small: Memory Hiearachy Optimizations Meet Transformers (MITTEN)

Today, there is an increasing need for running powerful Artificial Intelligence (AI) models on mobile phones. Many of the latest generation of AI models (including ChatGPT and Gemini) follow what is known as the transformer architecture. Similar to what has been seen in optimizing various workloads on different computing devices, a class of optimizations related to memory hierarchy is extremely important for the efficient execution of transformer-based models on modern mobile devices. This project is based on the premise that features of these workloads and characteristics of mobile devices require not only the application of existing techniques from compiler literature but also the development of new methods. The project’s novelties are in considering such workload and architecture combination and proposing techniques related to choosing new layouts, removing redundant layout changes that slow down execution, performing memory allocation judiciously to improve performance, and dealing with the newest accelerators. The project’s impacts are helping bring the latest advances in Artificial Intelligence (AI) on mobile and edge devices, letting these advances reach more individuals, and contributing to compiler and runtime support literature by developing new methods. <br/><br/>In targeting memory hierarchy-related optimizations for transformers, we observe that compared to the previous generation of deep learning-based models, transformers have more data flow splits, shuffles, merges, and transpose/reshape(-like) operations. Thus, various compilation systems targeting deep learning developed in the past decade fall short with respect to memory-related transformations, especially with a global view of the problem. This project builds on top of the investigators’ recent work developing a comprehensive framework for removing relayout operations and delivering significantly better performance for transformer models. Building on this work, the following agenda is being undertaken: Performance (Cost) Models -- a detailed performance model for execution on mobile GPUs is being developed, which will especially be novel in capturing the locality behavior of a 2.5D cache; Formal Approaches to Transformations – more formal approaches for the same set of optimizations (e.g., replacing a relayout operator) are undertaken, including both polyhedral formulation and computation-data graph-based approaches; Layout Transformation in View of New Instructions – as newer processors are increasingly offering matrix (or tensor)-based instructions, which have their own specific data layout requirements, memory performance-related problems in view of these requirements are undertaken; and Memory Management for Dynamic Models --focusing on emerging dynamic models, computation ordering, memory allocation, and memory fragmentation problems are investigated. The investigators are working towards creating more synergy between compiler research (especially memory/cache modeling and tuning) and ML-model development communities. The research on large-scale Machine Learning and Deep Learning transformation/implementation techniques is to be incorporated into courses taught by the investigators.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Anindya Banerjeeabanerje@nsf.gov7032927885
Min Amd Letter Date
7/22/2024 - 6 months ago
Max Amd Letter Date
7/22/2024 - 6 months ago
ARRA Amount

Institutions

Name
University of Georgia Research Foundation Inc
City
ATHENS
State
GA
Country
United States
Address
310 E CAMPUS RD RM 409
Postal Code
306021589
Phone Number
7065425939

Investigators

First Name
Wei
Last Name
Niu
Email Address
wniu@uga.edu
Start Date
7/22/2024 12:00:00 AM

First Name
Gagan
Last Name
Agrawal
Email Address
gagrawal@uga.edu
Start Date
7/22/2024 12:00:00 AM

Program Element

Text
Software & Hardware Foundation
Code
779800

Program Reference

Text
SMALL PROJECT
Code
7923

Text
PROGRAMMING LANGUAGES
Code
7943

SHF: Small: Memory Hiearachy Optimizations Meet Transformers (MITTEN)

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

SHF: Small: Memory Hiearachy Optimizations Meet Transformers (MITTEN)

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code

Text

Code