The deep sea is an epicenter of biogeochemical cycling that is globally important but poorly understood. Big data generated by emergent gene sequencing technology provides a new avenue to link genes with biological processes. In the deep sea, the vast majority of genes are unknown. This project will focus on methane seep systems. New microbial samples will be collected from methane seeps off the coast of Oregon and Washington. This research will employ a novel natural language processing artificial intelligence approach to predict what these unknown genes do. This will be a critical step toward quantifying oceanic ecosystem function based on genomics. The artificial intelligence models developed using these samples will be broadly applicable. They can provide a foundation to answer many questions across scientific fields ranging from ecology to human health. A tutorial for the models developed will be written and workshop run to explain the techniques. Further, artists will be involved in the research and a documentary will be produced to spread the results of the research.<br/><br/>This research will build two new artificial intelligence models to use gene sequence data to understand ecosystem processes, and apply them to methane seep habitats. A new model incorporating genes and ribosomal amplicon co-occurrence will code genes and classify them into pathways. In parallel, generative models with text and sequence protein representation will be developed. Models will identify putative genes responsible for each of the cycles identified, or dl-genes. These two models will be applied to new samples collected from methane seeps offshore Oregon and Washington. Methane seep habitats are areas where methane is consumed by microbial activity and are also areas with strong redox gradients leading to diverse methane and nitrogen over a small spatial area. Both artificial intelligence models will be applied to these habitats, and the results used to empirically validate the dl-genes by testing if the dl-genes are transcribed when the associated geochemical process is observed. The main outcome will be a scalable approach with artificial intelligence that will advance key questions in earth system science. To broaden the use of the methods developed in this project to solve similar problems, a tutorial and workshop will help others learn and use the models. Further, the results of this work will include exhibits by artists involved in the research as well as producing a documentary about how artificial intelligence can harness big data to help advance the understanding of earth systems.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.