Artificial Intelligence (AI) has brought significant advancements to fields such as natural language processing, computer vision, healthcare, and finance. However, proper training of AI models requires vast computational resources and energy. Typically, a training process requires thousands of powerful processors working together for days and consumes a tremendous amount of power. This project seeks to revolutionize AI training by developing a novel chip architecture that will make the training process dramatically faster and much more energy-efficient than the currently available processors. The project aims to make advanced AI technologies more accessible and sustainable, benefiting various sectors and fostering collaborations. Additionally, it will focus on expanding educational and workforce development initiatives, particularly for underrepresented groups in technology.<br/> <br/>The project proposes a new chip architecture that integrates forward and backward propagation into a single step, referred to as Co-FabPro (Concurrent Forward and backward Propagation). Co-FabPro enables linear scaling in both training and inference for long-sequence machine learning (ML) models. This approach employs hardware-software co-design to significantly reduce computational and energy demands. During the planning phase, the project will undertake conceptualization, feasibility studies, team formation, infrastructure assessment, and proposal development. By addressing the major bottlenecks in current ML training methods, Co-FabPro aims to develop a cutting-edge semiconductor chip and achieve a profound theoretical understanding of this new architecture. The project will deliver faster, more efficient training processes and disseminate findings through open-source releases and educational initiatives.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.