ABSTRACT The ability to find, combine, and analyze multiple large-scale biomedical datasets to make better and ethical decisions for the future of patients, populations, and health systems is now a set of necessary skills for modern analysts. However, most current data analytics and workshops focus on deriving or applying modern techniques, such as statistical learning procedures, PyTorch, TensorFlow, neural networks, and other large-scale prediction models, as opposed to the necessary steps involved in preparing data for such analyses. Further, the next (and current) generation of biomedical researchers must be cognizant of FAIR principles to be prepared to make their data accessible by machines in order to fully leverage the continued growth around methodological developments to properly analyze large amounts of data across multiple studies/systems/countries. In addition to a methodologic toolkit, educating the biomedical analyst workforce must include training to build their ability to locate and store data for future analyses in an automated manner. We propose a suite of stackable modules to provide a rich foundation to the existing robust educational offerings around the applications of AI/ML to biomedical data that many trainees already receive. Through our close partnerships with the NIEHS PROTECT Center and the multinational OHDSI community for observational health data science and informatics, our goal is to provide training to prepare data for AI and ML applications in a rigorous and reproducible way, understand the ethical issues around AI and ML, as well as receive hands-on training around FAIR principles for storing and accessing such data. These modules will prepare researchers for successful careers as data analysts, ready to exploit the power of available AI/ML frameworks.