This research project is focused on data quality evaluation and improvement for effective Artificial Intelligence (AI) deployment in manufacturing. In modern manufacturing industries, AI-guided decision-making has revolutionized production, product quality, design customization, and manufacturing sustainability. However, the lack of quantitative data quality evaluation and effective data preparation methodologies for manufacturing AI models pose a few critical challenges, including untrustworthy AI decision-making, high energy consumptions to process large but quality-poor datasets, and a lack of more effective datasets to be shared across manufacturing systems for AI model training. These challenges greatly slow down the adoption of AI technologies in manufacturing industries, thus significantly impacting the global competitiveness of US manufacturing. This research project defines and evaluates quantitative manufacturing data quality metrics, advances scientific knowledge for data quality assurance based on manufacturing features, and promotes dataset preparation for AI modeling. As a result, the research not only enables fast training and comparison of AI models due to improved manufacturing data quality, but also reduces environmental impact on data computation, communication, and storage. This research project also includes a comprehensive outreach and education program for college students and manufacturing workforce development, including panel discussion, outreach seminars to underrepresented students and practitioners, and manufacturing AI competitions. <br/><br/>The goal of this research project is to define, evaluate, and improve data quality to enable compatible usage of datasets in Manufacturing Industrial Internet integrated by heterogenous machines, sensors, and computation devices. The project builds the data quality methodology to address the challenges based on manufacturing specific data format and modalities from different manufacturing layouts. First, the data quality is defined as inversely proportional to the variance of AI model performance. A latent neural recommender system investigates the interface between datasets and AI models to assess data quality when different AI models are used. Second, manufacturing data quality is modeled based on the unique manufacturing data features from graphs of different manufacturing layouts and data modalities. Third, after the root causes of poor data quality are identified, golden datasets are generated by ensemble active learning by contextual bandits to ensure robust manufacturing AI model performance to data source variabilities. The data quality methodology connects to the manufacturing hierarchical variable relationship, multimodal data, and layout representations with effective feature representations. Methodologies are validated by both real datasets in Semiconductor Manufacturing and a Manufacturing Industrial Internet testbed.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.