EAGER: Algorithms for Analyzing Faulty Data Using Domain Information

Information

  • NSF Award
  • 2414736
Owner
  • Award Id
    2414736
  • Award Effective Date
    3/1/2024 - 2 months ago
  • Award Expiration Date
    2/28/2026 - a year from now
  • Award Amount
    $ 300,000.00
  • Award Instrument
    Standard Grant

EAGER: Algorithms for Analyzing Faulty Data Using Domain Information

The focus of this project is the building of a mathematical theory for analyzing large data that contains errors by taking advantage of domain knowledge regarding the processes that have created the data, as well as the error model. The project contains three thrusts, listed from the most well-defined to the most exploratory. The first thrust involves analyzing genomic data in order to investigate tumor evolution trees that lead to the development of cancer. The second involves analyzing faulty data generated by computer networks while utilizing information about the network such as its topology and delay pattern. The third is exploring other areas for which the techniques developed for the first two thrusts apply, making progress towards the goal of developing general techniques for analyzing faulty data in the absence of a known ground truth using domain information.<br/><br/>In the model that this project assumes, the input contains errors that have been probabilistically generated according to a known distribution in unknown locations. The goal that the investigator would like to explore is the creation of sampling techniques that do not blindly take random samples from the prohibitively large space for the ground truth; rather, it is to use the knowledge about restrictions that limit the possible space that could have led to the noisy input and analyze this much smaller space. In particular, the first focus of this project is to explore how such information can be used to generate efficient sampling techniques in order to infer properties of tumor progression trees, and, later on, more general phylogenetic trees. Later parts of this project involve applying this knowledge to routing graphs and other data with underlying well-structured graphs. Since such techniques rely on graph-theoretic assumptions underlying the inputs, the goal for all three thrusts is to develop widely applicable probabilistic techniques that will help one analyze noisy graph information in general, pushing existing theoretical knowledge forward, as well as bringing a better understanding to applied areas with strong theoretical underpinnings.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Karen Karavanickkaravan@nsf.gov7032922594
  • Min Amd Letter Date
    3/8/2024 - 2 months ago
  • Max Amd Letter Date
    3/8/2024 - 2 months ago
  • ARRA Amount

Institutions

  • Name
    Indiana University
  • City
    BLOOMINGTON
  • State
    IN
  • Country
    United States
  • Address
    107 S INDIANA AVE
  • Postal Code
    474057000
  • Phone Number
    3172783473

Investigators

  • First Name
    Funda
  • Last Name
    Ergun
  • Email Address
    fergun@indiana.edu
  • Start Date
    3/8/2024 12:00:00 AM

Program Element

  • Text
    Algorithmic Foundations
  • Code
    7796

Program Reference

  • Text
    SaTC: Secure and Trustworthy Cyberspace
  • Text
    ALGORITHMIC FOUNDATIONS
  • Code
    7796
  • Text
    EAGER
  • Code
    7916
  • Text
    WOMEN, MINORITY, DISABLED, NEC
  • Code
    9102