Tree Ensemble Regression and Classification Methods

Information

Research Project
6832086

ApplicationId
6832086
Core Project Number
R43CA105724
Full Project Number
1R43CA105724-01A1
Serial Number
105724
FOA Number
Sub Project Id

Project Start Date
8/1/2004 - 20 years ago
Project End Date
11/30/2005 - 19 years ago
Program Officer Name
GEZMU, MISRAK
Budget Start Date
8/1/2004 - 20 years ago
Budget End Date
11/30/2005 - 19 years ago
Fiscal Year
2004
Support Year
1
Suffix
A1
Award Notice Date
7/9/2004 - 20 years ago

Organizations

Insightful Corporation

Information

Tree Ensemble Regression and Classification Methods

[unreadable] DESCRIPTION (provided by applicant): [unreadable] This SBIR aims to produce next generation classification and regression software based upon ensembles of decision trees: bagging, random forests, and boosting. The prediction accuracy of these methods has caused much excitement in the machine learning community, and both challenges and complements the data modeling culture prevalent among biostatisticians. Recent research extends the methodology to likelihood based methods used in biostatistics, leading to models for survival data and generalized forest models. Generalized forest models extend regression forests in the same way that generalized linear models extend linear models. [unreadable] [unreadable] This software would apply broadly, including to medical diagnosis, prognostic modeling, and detecting cancer; and for modeling patient characteristics like blood pressure, discrete responses in clinical trials, and count data. [unreadable] [unreadable] Phase I work will prototype software for survival data, and investigate the performance of ensemble methods on simulated and real data. For survival applications, we will assess out-of-bag estimates of performance, and investigate measures of variable importance and graphics that help clinicians understand the results. Experience writing prototypes and using them on data will lead to a preliminary software design that serves as the foundation of Phase II work. [unreadable] [unreadable] Phase II will expand upon this work to create commercial software. We will research and implement algorithms for a wider range of applications including generalized forest models, classification, and least squares regression. We will also implement robust loss criteria that enable good performance on noisy data, and make adaptations to handle large data sets. [unreadable] [unreadable] This proposed software will enable medical researchers to obtain high prediction accuracy, and complement traditional tools like discriminant analysis, linear and logistic regression models, and the Cox model. [unreadable] [unreadable]

IC Name

NATIONAL CANCER INSTITUTE

Activity
R43
Administering IC
CA
Application Type
1

Direct Cost Amount
Indirect Cost Amount
Total Cost
99937
Sub Project Total Cost

ARRA Funded
CFDA Code
394
Ed Inst. Type
Funding ICs
NCI:99937\
Funding Mechanism
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
INSIGHTFUL CORPORATION
Organization Department
Organization DUNS
150683779
Organization City
SEATTLE
Organization State
WA
Organization Country
UNITED STATES
Organization Zip Code
98109
Organization District
UNITED STATES

Tree Ensemble Regression and Classification Methods

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Tree Ensemble Regression and Classification Methods

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District