Proposals 0820217/0819987<br/><br/>Enhancing the Reliability of Bioinformatics Software by Correlating User Feedback and Execution Data<br/><br/>PIs: Andy Podgurski and Wassim Masri<br/><br/>A neglected consequence of the proliferation of scientific data sets and computational services on the Internet is that the scientific community is becoming increasingly dependent on the quality of shared data sets and the reliability of the software used to analyze them. To make it easier for developers of bioinformatics software to ensure its reliability, this research seeks to develop, evaluate, and refine automated techniques to help developers discover emergent reliability problems with deployed software, understand their nature and significance, and diagnose their causes. The approach is based on eliciting structured feedback from users about the problem symptoms observed and then automatically correlating this feedback with far more detailed information about internal program dynamics and input-output mappings. Advanced data mining techniques will be employed in tandem with dynamic program dependence analysis to: pool problem reports from different users; corroborate individual reports; group failures according to their symptoms and causes; and help diagnose the causes of failures. The proposed work has the potential to significantly improve the reliability of bioinformatics applications used by thousands of scientists.