Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)

Information

NSF Award
2341378

Owner

University of Georgia Research Foundation Inc

Award Id
2341378
Award Effective Date
10/1/2023 - 8 months ago
Award Expiration Date
9/30/2026 - 2 years from now
Award Amount
$ 299,999.00
Award Instrument
Standard Grant

Information

Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)

As Machine Learning (ML), and especially Deep Neural Network (DNN) workloads have rapidly become prominent, many existing architectures have been enriched with instructions and/or processing capabilities targeting these workloads. Examples of these instructions include AMX instructions from Intel, Tensor cores from NVIDIA, DOT instructions from AMD, and many others. The emergence of such tensorized instructions is leading to many common and related challenges regarding how they can be used for production-level modern DNNs. The current state-of-the-art for exploiting these instruction sets for DNN workloads is very limited, with existing systems either completely lacking attention on these, not addressing global optimizations for complex DNNs, or being limited in other ways. The premise of our work is that a compilation system that is cognizant of the latest DNN trends and can optimize across different tensorized instruction sets, will provide large efficiency gains for modern ML computations. The resulting agenda will likely result in significant technical, economic, and societal impacts. From the technical side, the work impacts areas like High-Performance Computing (HPC), Compilers, and systems supporting AI/ML workloads. As DNNs are becoming an integral part of applications that most humans use, this work is poised to have a large economic and societal impact. On the education side, the research at the intersection of systems and ML will be incorporated into multiple courses and help to increase diversity at all levels in computing education and research, particularly by involving members from underrepresented groups.<br/><br/>This project addresses the following challenges associated with modern DNNs and recent and emerging tensorized instructions: 1) Local Instruction Selection for Dense Models -- To improve the execution efficiency of each operator, a critical first issue is selecting tensorized instructions (and associated data layouts), which will be addressed for arbitrary shapes of operators. 2) Global Optimizations for DNNs -- After local operator optimizations, each operator may prefer its own tensorized instruction and data layout, thus incurring significant data layout transformation costs during the execution of an entire DNN. This project formulates and solves a global optimization problem that chooses the right trade-off between the local operator execution and data transformation costs. 3) Optimizations for Dynamic DNNs -- This project also considers various forms of dynamism in modern DNN models including dynamic input shapes, dynamic control flows, and dynamic data structures. It proposes new optimizations such as those for effective memory management, while revisiting others like local and global instruction selection, in the presence of these forms of dynamism. 4) Mapping Sparse Models to Emerging Instructions -- This project also plans to improve the efficiency of using various types of tensorized instructions when sparsity is involved, building on top of earlier work for optimizing kernels like SpMM (and other sparse computations) on GPUs and SIMD instruction sets. 5) (Semi-) Automatic Support for New Instructions -- To minimize the optimization and programming effort, this proposal also introduces a module to automatically optimize DNN computations with new tensorized instructions or features. Besides addressing the above problems, one critical component of this project will be incorporating their implementations, together with code generation for multiple back-ends, in a reusable system. This system will take as the input the Computational Graph representation, and output Tensor and LLVM IRs, thus building around three representations widely used in the industry.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Karen Karavanickkaravan@nsf.gov7032922594
Min Amd Letter Date
9/14/2023 - 8 months ago
Max Amd Letter Date
9/14/2023 - 8 months ago
ARRA Amount

Institutions

Name
University of Georgia Research Foundation Inc
City
ATHENS
State
GA
Country
United States
Address
310 E CAMPUS RD RM 409
Postal Code
306021589
Phone Number
7065425939

Investigators

First Name
Gagan
Last Name
Agrawal
Email Address
gagrawal@uga.edu
Start Date
9/14/2023 12:00:00 AM

Program Element

Text
CSR-Computer Systems Research
Code
7354

Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code