Although crucial for advanced Artificial Intelligence (AI) applications due to their language understanding and generation capabilities, Large Language Models (LLMs) are energy intensive. This project’s goals and novelty are to enhance the efficiency of training and inference associated with LLMs by leveraging emerging high-speed networks and computing architecture. The project’s broader significance and importance are to (1) enable a broad range of LLMs to efficiently operate, advancing AI applications at a low energy cost; (2) strengthen international research collaboration between U.S. and India researchers; and (3) provide educational opportunities for graduate students.<br/><br/>This project addresses the energy efficiency challenges of LLMs by optimizing their energy consumption in heterogeneous Compute Express Link (CXL)-enabled hardware environments. By leveraging High-Performance Computing (HPC) middleware and the high-bandwidth, low-latency features of CXL, the project aims to ensure sustainable and efficient AI operations. This project seeks to find solutions to the following set of fundamental issues in training and using LLMs at scale: 1) identifying and characterizing idleness in the LLM workloads; 2) using the knowledge of long idleness to insert low-overhead Dynamic Voltage and Frequency Scaling (DVFS) control and undervolting to save static energy consumption; 3) designing CXL-aware and energy-efficient Message Passing Interface (MPI)-based communication runtime for LLM training and inferencing; and 4) studying the overall impact of the integrated systems on the energy consumption of LLM training and inference. The results are disseminated to collaborating organizations to impact their HPC/AI software applications and hardware chip designs, promoting broader societal advancement through improved technological capabilities.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.