Software to Handle Missing Values in Large Data

Information

Research Project
6690119

ApplicationId
6690119
Core Project Number
R43RR017862
Full Project Number
1R43RR017862-01A1
Serial Number
17862
FOA Number
Sub Project Id

Project Start Date
7/1/2003 - 22 years ago
Project End Date
9/30/2004 - 21 years ago
Program Officer Name
SWAIN, AMY L
Budget Start Date
7/1/2003 - 22 years ago
Budget End Date
9/30/2004 - 21 years ago
Fiscal Year
2003
Support Year
1
Suffix
A1
Award Notice Date
6/9/2003 - 22 years ago

Organizations

Insightful Corporation

Information

Software to Handle Missing Values in Large Data

DESCRIPTION (provided by applicant): This SBIR aims to produce commercial software for handling missing data in large data sets, where the goal is data mining and knowledge discovery. There may be a large number of subjects, variables, or both. Examples include microarray data, surveys, genomic data, and high throughput screening data. Handling missing data is one important step of careful data preparation, which is key to the success of an entire project. Missing values often arise in medical data. This is an obstacle because many data mining tools either require complete data or are not robust to missing data. Principled methods of handling missing data are computationally intensive. Therefore computational feasibility is a challenge to handling missing values in large data sets. Phase I work will explore strategies such as sampling, constraining parameters, and monotone data algorithms for model based techniques. Factor analysis and multivariate linear mixed effects models will be used to reduce the number of parameters. A variable-by-variable approach using a popular data mining technique, recursive partitioning, will also be used to impute missing values. For each of the methods, we will write prototype software and test performance on missing data patterns simulated on real data. Several ad hoc techniques will serve as a baseline for comparison. Experience writing prototypes and using them in simulations will lead to preliminary software design that will serve as the foundation of Phase II work. This proposed software will enable medical researchers to gain more from their data mining efforts: maximally extracting information and achieving unbiased predictions, despite missing data.

IC Name

NATIONAL CENTER FOR RESEARCH RESOURCES

Activity
R43
Administering IC
RR
Application Type
1

Direct Cost Amount
Indirect Cost Amount
Total Cost
99847
Sub Project Total Cost

ARRA Funded
CFDA Code
371
Ed Inst. Type
Funding ICs
NCRR:99847\
Funding Mechanism
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
INSIGHTFUL CORPORATION
Organization Department
Organization DUNS
150683779
Organization City
SEATTLE
Organization State
WA
Organization Country
UNITED STATES
Organization Zip Code
98109
Organization District
UNITED STATES

Software to Handle Missing Values in Large Data

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Software to Handle Missing Values in Large Data

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District