Somatic variant calling and phasing using colored de Bruijn graphs in heterogeneous tumors

Information

  • Research Project
  • 9685166
  • ApplicationId
    9685166
  • Core Project Number
    R21CA220411
  • Full Project Number
    5R21CA220411-02
  • Serial Number
    220411
  • FOA Number
    PAR-15-334
  • Sub Project Id
  • Project Start Date
    4/13/2018 - 6 years ago
  • Project End Date
    3/31/2020 - 4 years ago
  • Program Officer Name
    MILLER, DAVID J
  • Budget Start Date
    4/1/2019 - 5 years ago
  • Budget End Date
    3/31/2020 - 4 years ago
  • Fiscal Year
    2019
  • Support Year
    02
  • Suffix
  • Award Notice Date
    3/4/2019 - 5 years ago

Somatic variant calling and phasing using colored de Bruijn graphs in heterogeneous tumors

Abstract One of the central challenges in cancer genomics is the ability to accurately detect somatic mutations in heterogeneous tumors, and precisely determine which fraction of cancer cells harbor these mutations and at what frequency. Deeper understanding of the biological principals behind cancer evolution is central to the discovery of new cancer therapies. However, despite the tremendous advances in sequencing technologies over the last twenty years, most widely used computational approaches and biotechnologies still do not provide enough context to fully resolve the clonal structure in a tumor, due to a combination of low resolution, high cost, and prohibitive sample requirements. 10X Genomics has recently developed a new technology, called ?linked reads?, that address some of these limitations by providing long-range phasing information at low cost and with minimal sample requirements. However, for this data to achieve its full potential and benefit the whole cancer research community, new computational tools must be developed combining novel algorithms for next- generation sequencing data analysis with the long-range information stored in the linked-reads. We propose to overcome these challenges by developing a new variant caller that combines the long-range information in the linked-reads with powerful colored de Bruijn graph data structures to accurately discover and phase inherited and somatic variants (SNVs and indels) in tumor-normal paired sequencing data. The colored de Bruijn graph approach will exploit the full information in the data by jointly analyzing reads coming from the tumor and the normal samples together. This will reduce the false-discovery rate of alignment-based variant caller when detecting longer insertions and deletions, without sacrificing the additional variant calling power provided by the assembly method. Furthermore, the linked-reads data will allow phasing of the variants and improve determination of subclonal structure by directly observing which variants are present on the same molecule, and therefore within the same subclone. We will develop and carefully test our novel variant calling framework using a combination of synthetic and genuine datasets designed to assess the variant calling abilities under diverse sequencing conditions, tumor clonality, and sequencing platforms.

IC Name
NATIONAL CANCER INSTITUTE
  • Activity
    R21
  • Administering IC
    CA
  • Application Type
    5
  • Direct Cost Amount
    126701
  • Indirect Cost Amount
    58598
  • Total Cost
    185299
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    396
  • Ed Inst. Type
  • Funding ICs
    NCI:185299\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    ZCA1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    NEW YORK GENOME CENTER
  • Organization Department
  • Organization DUNS
    078473711
  • Organization City
    NEW YORK
  • Organization State
    NY
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    100131941
  • Organization District
    UNITED STATES