EAGER: End-to-End Learning of Paradoxes and Interpretations for Data Storytelling

Information

  • NSF Award
  • 2331065
Owner
  • Award Id
    2331065
  • Award Effective Date
    10/1/2023 - 7 months ago
  • Award Expiration Date
    9/30/2024 - 4 months from now
  • Award Amount
    $ 125,000.00
  • Award Instrument
    Standard Grant

EAGER: End-to-End Learning of Paradoxes and Interpretations for Data Storytelling

Can we use AI and machine learning to automatically create compelling narratives from extensive data sets, like business data, demographics, and economic statistics? This project aims to answer this question in the affirmative and make data storytelling accessible to everyone, even those without advanced big data skills. Data storytelling is a powerful technique for understanding and extracting valuable insights from large data sets. However, it can be daunting for individuals who lack the technical expertise to navigate and interpret big data effectively. In this project, we propose an innovative approach to automatic data storytelling by learning from paradoxes found within a large data set. Paradoxes are unexpected contradictions that exist within data, and they hold the potential to uncover critical and surprising information. By harnessing these paradoxes, we can craft engaging narratives that highlight the most interesting and important aspects of the data, especially for decision-makers and data consumers. The ultimate goal of this project is to develop open-source software that facilitates creating compelling data stories. The outcomes and findings will be made publicly available, ensuring that individuals and organizations can benefit from this project. Additionally, the insights gained from this research will be incorporated into data science courses, empowering future data storytellers with the necessary skills to communicate complex information effectively. By democratizing data storytelling through the use of paradoxes and making it accessible to a wider audience, we can unlock the hidden potential of extensive datasets and strengthen data-driven decision-making in various domains.<br/><br/>This project will address the core challenges of paradox identification through two main thrusts. First, we will focus on investigating concise and non-redundant representations of statistical relationships among variables in data. Our goal is to formulate representations that are both concise and minimal, ensuring efficiency in conveying information. To demonstrate the feasibility and potential of our approach, we will specifically examine Simpson's paradox in this pilot project. Second, building on the established model, we will develop efficient and scalable algorithms to find data paradoxes and their interpretations. Our focus will be on exploring the efficiency, completeness, and non-redundancy of the learning process, using real-world datasets for evaluation. The knowledge and insights gained from this research will be integrated into data science education and training programs, enhancing the skills and capabilities of future data scientists. This project's transformative nature lies in its recognition of data storytelling as a fundamental approach within the realm of data science. The algorithms and tools we develop will substantially enhance the abilities of data scientists, statisticians, and business intelligence analysts to explore data, uncover new knowledge, and deliver valuable insights. As a result, these advancements will have wide-ranging applications and be of significant value to diverse communities. By addressing the challenges of paradox learning and advancing the field of data storytelling, this project will facilitate the exploration and interpretation of complex data, ultimately enabling more informed decision-making to a wider set of people and driving innovation across various sectors.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Raj Acharyaracharya@nsf.gov7032927978
  • Min Amd Letter Date
    6/6/2023 - 11 months ago
  • Max Amd Letter Date
    6/6/2023 - 11 months ago
  • ARRA Amount

Institutions

  • Name
    Duke University
  • City
    DURHAM
  • State
    NC
  • Country
    United States
  • Address
    2200 W MAIN ST
  • Postal Code
    277054640
  • Phone Number
    9196843030

Investigators

  • First Name
    Jian
  • Last Name
    Pei
  • Email Address
    j.pei@duke.edu
  • Start Date
    6/6/2023 12:00:00 AM

Program Element

  • Text
    Info Integration & Informatics
  • Code
    7364

Program Reference

  • Text
    INFO INTEGRATION & INFORMATICS
  • Code
    7364
  • Text
    EAGER
  • Code
    7916