RAW SEQUENCING DATA PROCESSING AND BASE CALLING

Information

Research Project
7252524

ApplicationId
7252524
Core Project Number
R01HG002929
Full Project Number
5R01HG002929-03
Serial Number
2929
FOA Number
PA-97-44
Sub Project Id

Project Start Date
6/1/2005 - 19 years ago
Project End Date
5/31/2008 - 16 years ago
Program Officer Name
FELSENFELD, ADAM
Budget Start Date
6/1/2007 - 17 years ago
Budget End Date
5/31/2008 - 16 years ago
Fiscal Year
2007
Support Year
3
Suffix
Award Notice Date
6/5/2007 - 17 years ago

Organizations

ST. THOMAS UNIVERSITY

Information

RAW SEQUENCING DATA PROCESSING AND BASE CALLING

DESCRIPTION (provided by applicant): The long term objective of this application is to develop a software application for processing raw data obtained using DNA capillary electrophoresis sequencing machines (data processing) and identify the DNA bases achieving an overall higher accuracy over the existing techniques (base calling). The specific aims are to: collect a large number of data files (approximately 50,000 files will be used), create a database including the correct basecalls associated with each of the datafiles, develop a methodology for comparing the results of two basecallers (and incorporate the confidence values associated with each call into the assessment method), develop novel algorithms for processing the raw data, incorporate into basecalling a model for the peak amplitudes, improve the current base spacing model and finally, test the basecaller with the above proposed database. The proposed methodology is based on a novel signal processing approach applied to the raw data. A highly adaptive filter will be used for the raw data. The filter will adapt to the various levels of noise in the raw data and to the variation of the peaks width. The order in which traditional steps for DNA sequencing raw data processing are performed will be changed to allow for a better color separation between the channels. Features from the data itself will be identified and used to predict the base calls. For instance, a peak amplitudes model will be created to allow for a better prediction of the base calls. This peak amplitudes model will also be used to indicate whether or not an individual base follows the model, thus indicating a probability for an insertion/deletion error. An automatic algorithm will be developed to detect and remove stutter peaks from the raw data. Combined with an improved cross-talk removal procedure this will allow for a better sensitivity in identifying heterozygotes in the processed sequences. The calculated confidence values will follow the current standard as introduced by phred and will be calibrated such that for data with reduced levels of noise to match the actual accuracy rate over the testing database. The software and the testing database will be free of charge for academic and publicly funded sequencing projects.

IC Name

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

Activity
R01
Administering IC
HG
Application Type
5

Direct Cost Amount
Indirect Cost Amount
Total Cost
161534
Sub Project Total Cost

ARRA Funded
CFDA Code
172
Ed Inst. Type
GRADUATE SCHOOLS
Funding ICs
NHGRI:161534\
Funding Mechanism
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
UNIVERSITY OF ST. THOMAS
Organization Department
NONE
Organization DUNS
606870090
Organization City
ST PAUL
Organization State
MN
Organization Country
UNITED STATES
Organization Zip Code
55105
Organization District
UNITED STATES

RAW SEQUENCING DATA PROCESSING AND BASE CALLING

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

RAW SEQUENCING DATA PROCESSING AND BASE CALLING

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District