Long read based sequencing software for the comprehensive analysis of clinical samples

Information

  • Research Project
  • 10130575
  • ApplicationId
    10130575
  • Core Project Number
    R44GM137643
  • Full Project Number
    5R44GM137643-02
  • Serial Number
    137643
  • FOA Number
    PA-19-272
  • Sub Project Id
  • Project Start Date
    4/1/2020 - 4 years ago
  • Project End Date
    3/31/2022 - 2 years ago
  • Program Officer Name
    RAVICHANDRAN, VEERASAMY
  • Budget Start Date
    4/1/2021 - 3 years ago
  • Budget End Date
    3/31/2022 - 2 years ago
  • Fiscal Year
    2021
  • Support Year
    02
  • Suffix
  • Award Notice Date
    3/19/2021 - 3 years ago
Organizations

Long read based sequencing software for the comprehensive analysis of clinical samples

The high cost and complexity of the analysis of whole genome resequencing remain prohibitive for most clinical applications. Targeted resequencing allows regions of interest to be enriched from a genomic DNA sample and sequenced to high depth allowing cost-effective identification of important variants. In combination with next- generation sequencing (NGS), the approach has been exploited to tremendous effect in identifying candidate genes and variants for an array of diseases and traits from cohorts and populations as well as individual clinical samples. However, the short read nature of NGS technologies severely limits its potential to characterize, for example, compound heterozygotes due to the lack of long range connectivity needed for haplotype phasing and structural variants (SV). Those limitations can be overcome with long read data from Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT). Moreover, new targeting methods tailored toward long read sequencing are being developed such that a comprehensive analysis of key regions in an individual?s genome will soon be within reach. However, an integrated software solution that is easy enough for clinical researchers to efficiently use is sorely lacking. The overall goal of this Direct to Phase II proposal is to develop commercial-grade software that produces a comprehensive catalog of annotated haplotype phased variants from clinical sequencing data and presents them to clinical researchers through a single easy-to-use application with both analytical and genome browsing capabilities, GenVision Ultra. The proposal focuses on augmenting our highly extensible XNG assembly pipeline with tools necessary for fully automated detection and annotation of all classes of variants from haplotype phased sequences. Novel adaptions to core XNG components will partition reads matching the reference from those likely representing a SV for parallel processing (Aim 1). Matching reads will be aligned to the reference using XNG while the putative SV-containing reads will be de novo assembled and annotated using our long read assembler (LRA). Reference-based alignments will be phased using a novel Bayesian classifier to produce two haplotype sequences prior to SNV/small indel calling and annotation (Aim 2). Short read polishing of the entire assembly will be available on demand. Complete small variant and SV profiles as well as the underlying assembly data will be accessible to the end user in GenVision Ultra. In addition, the application will have discrete filtering and statistical tools with which to identify genes and/or variants of interest in an individual sample or across a cohort/population (Aim 3). To ensure that the software meets the clinical sequencing market needs, Arkana Laboratories has agreed to provide ONT and Illumina sequence data from highly curated HapMap control samples processed with their kidney disease gene panels. Those real-world data sets together with expert interpretation and feedback by Arkana researchers provide an ideal environment in which to develop an outstanding software solution for this critical market (Aim 4).

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R44
  • Administering IC
    GM
  • Application Type
    5
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    750000
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
  • Funding ICs
    NIGMS:750000\
  • Funding Mechanism
    SBIR-STTR RPGs
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    DNASTAR, INC.
  • Organization Department
  • Organization DUNS
    130194947
  • Organization City
    MADISON
  • Organization State
    WI
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    537055202
  • Organization District
    UNITED STATES