John Karanicolas of the Institute for Cancer Research is supported by an award from the Chemical Theory, Models and Computational Methods program and by the Data-Driven Discovery Science in Chemistry initiative in the Division of Chemistry, to develop an artificial intelligence-based strategy for designing small molecule compounds that bind to and probe the functions of human kinases. Kinases are a family of proteins responsible for controlling virtually all signaling processes in multicellular organisms. Such processes are fundamental to life itself, and play a critical role in cell division, cell growth and motility, and cellular transport. Consequently, kinases also represent potential targets for novel cancer drugs. There are several hundred kinase proteins, organized into complex signaling pathways. Each pathway acts like a set of cascading dominos: once an initial chemical trigger occurs at a key location on the kinase molecule, it sets in motion a series of downstream reactions translating into specialized molecular functions. Since individual kinases can participate in multiple cellular processes, these signaling pathways are often highly intertwined. To determine the function of a particular kinase, scientists utilize small molecules as chemical probes that bind to the kinase, suppress its activity, and enable the analysis of downstream impact on cellular function. However, a significant challenge is that most existing chemical probes are not as selective as originally thought: many probes act on multiple kinases at once. Experimental "wet lab" testing of all possible combinations of hundreds of kinases against tens of thousands of potential chemical probes is not practical due to cost. Professor Karanicolas and his students are utilizing state-of-the-art advances in artificial intelligence and machine learning, publicly-available data on select kinase-probe interactions, and specialized techniques developed in his laboratory for modeling the 3D structure of protein-small molecule binding, to develop a computational model that accurately predicts kinase binding affinities and chemical probe selectivity. The project's discoveries have the potential for significant impact in facilitating fundamental studies of cell biology, as well as the identification of selective kinase inhibitors for pharmaceutical design. The new methods are being implemented in software distributed through public repositories and the widely-used Rosetta software suite. The research is providing cross-disciplinary training opportunities fusing chemical biology and data science to students at the high school, undergraduate, and graduate levels. <br/><br/>Modern cell biology leans heavily on kinase inhibitors as chemical probes for analyzing the consequences of deactivating a particular kinase, but the majority of commonly-used chemical probes are not sufficiently target-selective for robust interpretation of observed phenotypes. By assembling large panels of kinases (corresponding to much of the human kinome), it is possible to experimentally determine selectivity for a given probe: however, these experiments are expensive and impractical to perform at scale. This project is applying deep learning techniques to predict the binding affinities of individual inhibitor/kinase pairs, using 3D structural descriptors derived from a novel method for modeling inhibitor/kinase complexes, recently developed in the Karanicolas group. Following careful training and benchmarking, the predictive model is being used in two ways: 1) to evaluate the selectivity of chemical probes that are widely used by cell biologists, and determine which compounds constitute useful tools and which compounds should be deprecated; and 2) to screen a large chemical library of small molecules to identify new candidate probes that are expected to have strong binding affinity and selectivity for a given kinase. The model's predictions in both applications are being tested experimentally through biochemical assays. While it has long been hypothesized that inclusion of 3D structural features would improve existing machine learning approaches for predicting protein/ligand binding affinities, the availability of a rapid and accurate method for building 3D structural models will allow this hypothesis to be tested for the first time. If successful, insights from this project will provide a starting point for developing models to predict binding affinities of other protein-ligand complexes as well, for expanded applications in cell biology and drug discovery. The results of the project are being disseminated as publicly-available source code through SourceForge, and as modules within the widely-used Rosetta software suite.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.