Collaborative Research: SHF: Medium: Learning Semantics of Code To Automate Software Assurance Tasks

Information

NSF Award
2313055

Owner

Columbia University

Award Id
2313055
Award Effective Date
10/1/2023 - 8 months ago
Award Expiration Date
9/30/2027 - 3 years from now
Award Amount
$ 666,000.00
Award Instrument
Standard Grant

Information

Collaborative Research: SHF: Medium: Learning Semantics of Code To Automate Software Assurance Tasks

Deep learning has demonstrated great potential for accomplishing software engineering tasks. However, its capabilities are limited for challenging yet very important software assurance tasks such as bug detection, debugging, test input generation, and test suite prioritization. These tasks are hard to formulate into a learning problem. A major part of the difficulty is that these complex tasks require modeling of program semantics. To the best of our knowledge, even state-of-the-art deep learning models have an insufficient understanding of program semantics. As a result, the models fail to achieve sufficient precision and recall to be more widely deployed. The tools do not generalize well to unseen projects and are not robust to small perturbations in source code. It also takes large amounts of computational resources and data to train the models. <br/><br/>In this project, the team of researchers aims to improve the performance, robustness, generalizability and efficiency of deep learning models for software assurance and to enable deep learning for complex tasks that have not yet successfully used deep learning. Solutions will target encoding program semantics into the program representation by combining program analysis, software engineering, and deep learning expertise to develop novel formulations to effectively reduce software assurance problems via deep learning. The project has three research thrusts: To learn with abstract semantics, the project will study how to combine static analysis algorithms and the results from static analysis with deep learning models. To learn with concrete semantics, the project will study how to use program execution traces to guide deep learning. Finally, the project will investigate how to identify spurious features used by the current models and then apply causal learning to discourage models that have spurious features. Research results, datasets, and tools will be disseminated to the research community, and workshops will be organized to strengthen the research community of deep learning for code.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Sol Greenspansgreensp@nsf.gov7032927841
Min Amd Letter Date
6/23/2023 - 11 months ago
Max Amd Letter Date
6/23/2023 - 11 months ago
ARRA Amount

Institutions

Name
Columbia University
City
NEW YORK
State
NY
Country
United States
Address
202 LOW LIBRARY 535 W 116 ST MC
Postal Code
10027
Phone Number
2128546851

Investigators

First Name
Gail
Last Name
Kaiser
Email Address
kaiser@cs.columbia.edu
Start Date
6/23/2023 12:00:00 AM

First Name
Baishakhi
Last Name
Ray
Email Address
rayb@cs.columbia.edu
Start Date
6/23/2023 12:00:00 AM

Program Element

Text
Software & Hardware Foundation
Code
7798

Program Reference

Text
MEDIUM PROJECT
Code
7924

Text
SOFTWARE ENG & FORMAL METHODS
Code
7944

Text
WOMEN, MINORITY, DISABLED, NEC
Code
9102

Collaborative Research: SHF: Medium: Learning Semantics of Code To Automate Software Assurance Tasks

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: SHF: Medium: Learning Semantics of Code To Automate Software Assurance Tasks

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code

Text

Code

Text

Code