Artificial intelligence (AI) and artificial neural networks (ANNs) have dominated conversations about the future of science, technology, economy, society and culture, and even humanity itself. Excitement about AI comes about because it recognizes patterns and even discovers solutions that are superior to those based on human intelligence. The rapid progress of AI is greatly attributed to increased computing capabilities. In recent years, the computation power of integrated circuits (ICs) have been unable to sustain the growth rate according to Moore's law. Hardware accelerators, with processing units and optimized memory architecture designed specifically for parallel computing, played a key role in the implementation of machine learning (ML) models. However, electronic hardware accelerators have already been pushed to their limits in term of scalability, unable to keep up with the exponential growth of data volume. Against this backdrop, there have been renewed efforts in exploring the role of optics in computing, motivated by the large bandwidth and low loss of optical transmission. This project proposes the photonic tensor accelerator (PTA), a highly-parallel photonic architecture capable of matrix-vector multiplication and matrix-matrix multiplication, that offers a computing power several orders of magnitude higher than existing electronic accelerators. Thanks to its high degree of parallelization, PTA is specifically suited for batch matrix multiplication for the implementation of ANN models. The technology developed in this proposal could demonstrate the cooperative roles of advanced hardware and software and attract more students into hardware-related areas. The research proposed is interdisciplinary in nature and can serve as a platform for training both graduate and undergraduate students at UCF, a Hispanic Serving Institution (HSI) designated by the U.S. Department of Education.<br/><br/>The overarching goal of this project is to construct photonic accelerators that 1) offer orders-of-magnitude higher scalability over electronics, 2) are fast, programmable, ideally compatible with training as well as inference, and 3) lower the power-consumption density to enable ANNs that are competitive over their pure electronic counterparts. The core of ANNs is tensor multiplication, which only require special operations (multiplication and accumulation, rather than general-purpose computing) in large scale that are especially suited for photonic accelerators. In addition, ANNs are robust to low dynamic range variabilities in nonlinear activation. PTA exploits all degrees of freedom of light to accelerate tensor multiplication. Specifically, PTA utilizes coherent beating between a signal and local oscillator to perform multiplication, frequency/wavelength, spatial modes and polarization for accumulation, and 2-D and 3-D parallelism of free space to scale the processing power. The proposed approach could scale the number of multiply-accumulate (MAC) operations by several orders of magnitude over the state-of-the-art IC hardware accelerators, including graphical processing units (GPUs) and ASICs such as tensor processing units (TPUs). The project will repurpose the technique of recirculating loops to scale up the number of layers for deep neural networks (DNNs). The proposed research is also synergistic with artificial intelligence (AI) in that some of the new devices will be designed using machine-learning techniques and the availability of the PTA-based ANNs allows new paradigms of ANN training.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.