Data are often modeled as matrices. As a result, linear algebraic algorithms, and in particular matrix decompositions, have proven extremely successful in the analysis of datasets in the form of matrices. RandNLA (Randomized Numerical Linear Algebra), which integrates the complementary perspectives that theoretical computer science and numerical linear algebra bring to matrix computations, has led to nontrivial theory and high-quality implementations, and it has proven useful in a range of scientific and internet applications. This project will addresses statistical properties of RandNLA algorithms, and how these algorithms are used in downstream convex and non-convex optimization pipelines. This project will facilitate the development of algorithmic methods for the extraction of knowledge from large genetic, medical, internet, financial, astronomical, and other scientific data sets, and it will also focus on broader interdisciplinary educational opportunities, including undergraduate courses on the mathematics of data science. <br/><br/>Examples of technical challenges of interest include that the randomness inside the algorithm can lead to implicit regularization, and that it can also lead to usefulness in downstream applications that is not captured by existing theory. These and other challenges will be addressed in several complementary ways. First, by developing bootstrapping methods for core RandNLA algorithms. Second, by developing improved statistical analysis of core RandNLA algorithms. Third, by developing non-linear leverage scores for more general statistical objectives. Fourth, by developing methods to combine in a principled manner SGD and RandNLA. And fifth, by providing implementations addressing scientific data analysis applications, and also by considering longer-term directions of interdisciplinary interest. In each case, there will be a focus on complementary stochastic and numerical aspects of RandNLA algorithms, as well as on how RandNLA primitives are used in realistic convex and non-convex machine learning pipelines. This will lead to new insights in algorithmic and statistical theory, as well as more useful algorithms in practical implementations and applications.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.