I-Corps: Translation Potential of Synthetic Data Generation with Nullspace Sampling for Tabular and Timeseries Data

Information

NSF Award
2422393

Owner

Vanderbilt University

Award Id
2422393
Award Effective Date
5/15/2024 - a year ago
Award Expiration Date
4/30/2025 - 3 months ago
Award Amount
$ 50,000.00
Award Instrument
Standard Grant

Information

I-Corps: Translation Potential of Synthetic Data Generation with Nullspace Sampling for Tabular and Timeseries Data

The broader impact of this I-Corps project is based on the development of software to generate synthetic data for use in the healthcare, consulting, and insurance industries. Synthetic data is artificially generated data that is statistically similar to real-world datasets used by businesses. Synthetic data can be used for analytics and machine learning when access to real data is limited and may have uses in augmenting minority representation in real-world datasets thereby aiding in more equitable outcomes. Overall, the broad applicability of synthetic datasets has the potential to drive innovation in healthcare and other industries by allowing businesses to share synthetic versions of proprietary data with strategic partners, such as data analytics companies, and remain in full compliance with data privacy laws. This ability can lead to an increase in data-driven decision-making in the private sector and effective policy formulation in the public sector. For instance, applications of synthetic medical data may help healthcare researchers and administrators to better model patient activity, including representative data of understudied populations, and ultimately improve human health.<br/> <br/>This I-Corps project utilizes experiential learning coupled with a first-hand investigation of the industry ecosystem to assess the translation potential of the technology. The solution is based on the prior development of a non-deep learning technique to generate synthetic datasets using features of real data. Synthetic data is artificially generated data that is statistically similar to real datasets and can be used for analytics and machine learning when access to real data is limited. This innovative solution allows significantly faster generation of tabular and timeseries synthetic data without the need for training or optimization processes, while internally using linear algebra-based techniques. Although this solution was initially created to generate synthetic timeseries data, it can be modified to generate synthetic tabular data. This solution is completely non-parametric and does not involve the additional steps associated with training and optimization, making it 300x faster than state-of-the-art deep learning generation methods for tabular data. Thus, this approach can generate richly structured datasets using significantly less computing time relative to deep-learning methods.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Molly Waskomwasko@nsf.gov7032924749
Min Amd Letter Date
5/6/2024 - a year ago
Max Amd Letter Date
5/6/2024 - a year ago
ARRA Amount

Institutions

Name
Vanderbilt University
City
NASHVILLE
State
TN
Country
United States
Address
110 21ST AVE S
Postal Code
372032416
Phone Number
6153222631

Investigators

First Name
Mikail
Last Name
Rubinov
Email Address
mika.rubinov@vanderbilt.edu
Start Date
5/6/2024 12:00:00 AM

Program Element

Text
I-Corps
Code
802300

Program Reference

Text
Software Services and Applications
Code
8032

I-Corps: Translation Potential of Synthetic Data Generation with Nullspace Sampling for Tabular and Timeseries Data

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

I-Corps: Translation Potential of Synthetic Data Generation with Nullspace Sampling for Tabular and Timeseries Data

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code