Collaborative Research: SHF: Medium: Learning Semantics of Code To Automate Software Assurance Tasks

Information

  • NSF Award
  • 2313055
Owner
  • Award Id
    2313055
  • Award Effective Date
    10/1/2023 - 8 months ago
  • Award Expiration Date
    9/30/2027 - 3 years from now
  • Award Amount
    $ 666,000.00
  • Award Instrument
    Standard Grant

Collaborative Research: SHF: Medium: Learning Semantics of Code To Automate Software Assurance Tasks

Deep learning has demonstrated great potential for accomplishing software engineering tasks. However, its capabilities are limited for challenging yet very important software assurance tasks such as bug detection, debugging, test input generation, and test suite prioritization. These tasks are hard to formulate into a learning problem. A major part of the difficulty is that these complex tasks require modeling of program semantics.  To the best of our knowledge, even state-of-the-art deep learning models have an insufficient understanding of program semantics. As a result, the models fail to achieve sufficient precision and recall to be more widely deployed. The tools do not generalize well to unseen projects and are not robust to small perturbations in source code. It also takes large amounts of computational resources and data to train the models. <br/><br/>In this project, the team of researchers aims to improve the performance, robustness, generalizability and efficiency of deep learning models for software assurance and to enable deep learning for complex tasks that have not yet successfully used deep learning. Solutions will target encoding program semantics into the program representation by combining program analysis, software engineering, and deep learning expertise to develop novel formulations to effectively reduce software assurance problems via deep learning. The project has three research thrusts: To learn with abstract semantics, the project will study how to combine static analysis algorithms and the results from static analysis with deep learning models. To learn with concrete semantics, the project will study how to use program execution traces to guide deep learning. Finally, the project will investigate how to identify spurious features used by the current models and then apply causal learning to discourage models that have spurious features.  Research results, datasets, and tools will be disseminated to the research community, and workshops will be organized to strengthen the research community of deep learning for code.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Sol Greenspansgreensp@nsf.gov7032927841
  • Min Amd Letter Date
    6/23/2023 - 11 months ago
  • Max Amd Letter Date
    6/23/2023 - 11 months ago
  • ARRA Amount

Institutions

  • Name
    Columbia University
  • City
    NEW YORK
  • State
    NY
  • Country
    United States
  • Address
    202 LOW LIBRARY 535 W 116 ST MC
  • Postal Code
    10027
  • Phone Number
    2128546851

Investigators

  • First Name
    Gail
  • Last Name
    Kaiser
  • Email Address
    kaiser@cs.columbia.edu
  • Start Date
    6/23/2023 12:00:00 AM
  • First Name
    Baishakhi
  • Last Name
    Ray
  • Email Address
    rayb@cs.columbia.edu
  • Start Date
    6/23/2023 12:00:00 AM

Program Element

  • Text
    Software & Hardware Foundation
  • Code
    7798

Program Reference

  • Text
    MEDIUM PROJECT
  • Code
    7924
  • Text
    SOFTWARE ENG & FORMAL METHODS
  • Code
    7944
  • Text
    WOMEN, MINORITY, DISABLED, NEC
  • Code
    9102