Computing systems' ability to efficiently and timely process large amounts of data is a key enabler in the modern landscape of data-driven applications. To bridge the widening gap between memory technology and processors, computing systems continue to rely heavily on complex multi-level cache hierarchies. Caches can prevent costly accesses to downstream memory if the processed data items exhibit good spatiotemporal locality. Unfortunately, locality does not always emerge naturally in complex data processing pipelines. Platform-specific algorithmic optimizations are often necessary to rearrange the algorithm’s memory access pattern for better locality while striving to maintain the original semantics. When operating on high-dimensional objects (e.g., tensors), data locality unlocks crucial performance gains, but it becomes harder to achieve. This project proposes a novel class of architectural data transformation units to be interposed between memory and compute, for example Central Processing Units (CPUs) and Graphics Processing Units (GPUs). By relying on knowledge of the data access pattern followed by the algorithmic semantics, they decouple the in-memory geometry of data items from the access sequence required by the computational logic. As such, they make data items requested sequentially appear to the processing unit—and cache hierarchy—as if they were stored sequentially without data duplication through on-the-fly transformations. This enables spatiotemporal locality to be achieved effortlessly, i.e., without the need for heavy algorithmic re-engineering. The findings will be integrated into undergraduate and graduate courses at Boston University and the University of Kansas, enhancing topics such as data systems, system performance evaluation, embedded real-time systems, and operating systems. The project will support underrepresented populations across educational levels and foster strong industry connections. <br/><br/>This project explores the theory and practice concerning the formulation, design, and implementation of architectural on-the-fly Data Transformation Units (DTUs). It does so by thrusting along three interconnected research avenues. First, the investigators focus on developing a foundational science of on-the-fly data transformation. A key stepping stone is formulating an access pattern specification language that is both expressive and efficiently interpretable in hardware. In the second thrust, two alternative architectural paradigms are explored, namely (1) the integration of DTUs as a component logically placed on the memory bus and (2) the integration of a DTU directly into the memory controller. Doing so places data transformation as close as possible to the memory cells to exploit their inherent parallelism while supporting unmodified commercial memory modules. The third thrust explores which programming models can best empower application designers to use DTUs via a combination of instruction-set architecture extensions, operating system-level support, and user-space libraries. Finally, the fourth thrust aims at identifying widely adopted data processing pipelines that can greatly benefit from using DTUs, specifically focusing on relational databases and machine learning. These will be used to concretely showcase the potential of the proposed on-the-fly data transformation approach.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.