Scalable optimization algorithms that can handle large-scale modern datasets are an integral part of many applications of machine learning. Optimization methods that use only first derivative information, i.e., so-called first-order methods, are most common. However, many of these first-order methods come with inherent disadvantages, e.g., slow convergence, poor communication, and the need for laborious manual tuning. On the other hand, so-called second-order methods, i.e., methods that use second derivative information, come equipped with the ability to mitigate many of these disadvantages, but they are far less used within machine learning. This project will develop, implement, and apply novel methods that by innovative application of second-order information allow for enhanced design, diagnostics, and training of machine learning models. <br/><br/>Technical work will focus on theoretical developments, efficient implementations, and applications in multiple settings. Theoretical developments will tackle challenges involved in training large-scale non-convex machine learning models from four general angles: high-quality local minima; distributed computing environments; generalization performance; and acceleration. Work will also develop efficient Hessian-based diagnostics tools for the analysis of the training process as well as of already-trained models. Finally, improvements and applications of the proposed methods in a variety settings will be developed: improved communication properties; exploiting adversarial data; and exploring how these ideas can be used for more challenging problems such as how to improve neural architecture design and search. In all cases, high-quality user-friendly implementations for both shared-memory and distributed computing environments will be made available.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.