Modern software applications in all facets of society, including commercial, scientific, and non-profit enterprises, rely on databases to store information. These organizations often want to use their data in ways they cannot easily express with existing database query languages, especially in the context of artificial intelligence and data science applications. This mismatch means such applications wait longer for answers about their data, inhibiting them from reacting to changes as quickly as possible and impeding their goals. This research addresses this problem and develops foundational techniques that automatically removes such inefficiencies without requiring organizations to perform costly rewrites of their application code. It enables organizations to ask more complex questions about their data and extrapolate new knowledge from it, all while using less computing and energy resources than today’s systems.<br/><br/>Many database management systems (DBMSs) extend the query language SQL to support user-defined functions (UDFs) written in procedural programming languages. Despite their software engineering advantages, UDFs are notoriously difficult to optimize within database systems, and DBMSs often resort to executing them iteratively (row-by-row). This project focuses on developing optimization approaches to overcome SQL and UDF boundaries via automatic code transformations that pass critical information between them to enable more effective query planning and compilation. These strategies include methods for programmatically deconstructing UDFs into smaller pieces, manipulating them individually, and reconstructing them into the calling query to optimize performance. This project will address three fundamental research challenges: (1) improving the performance of UDFs without requiring modifications to the application code, (2) optimizing external language UDFs (e.g., Python) that rely on dynamic types and library calls, and (3) generating new optimizations that leverage information about UDFs across the entire lifecycle of a query and multiple invocations. By eliminating performance penalties associated with UDFs, this research will enable organizations to improve the efficiency of applications and support more complex workloads, including leveraging machine learning and data science libraries.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.