ReDDDoT Phase 1: Planning Grant: Piloting a framework to measure the impacts of artificial intelligence tools for government agencies

Information

  • NSF Award
  • 2427748
Owner
  • Award Id
    2427748
  • Award Effective Date
    10/1/2024 - 10 months ago
  • Award Expiration Date
    3/31/2026 - 7 months from now
  • Award Amount
    $ 300,000.00
  • Award Instrument
    Standard Grant

ReDDDoT Phase 1: Planning Grant: Piloting a framework to measure the impacts of artificial intelligence tools for government agencies

This project develops an evaluation methodology around the implementation of Large Language Model (LLM)-based tools to support human experts working in federal/state/local government programs. The complexity of rules in these programs means that mistakes are quite common, often creating more work for staff and participants. LLM-based tools have the potential to reduce that complexity by reflecting relevant rules and information from internal knowledge bases back to staff, along with citations. They can also be used to automate and simplify steps that require significant time and introduce potential for human error. For instance, rather than having staff manually key in data from scanned documents, these tools can assist by categorizing and populating data, which are then reviewed by staff. This project will conduct a study that reflects the breadth of households across the country and evaluate the trade-offs of implementing these systems: from the cost of development, to quantifying errors in LLM responses, to evaluating the potential burdens on human experts in correcting errors. This experiment compares performance (measured via metrics such as answer accuracy and time-to-answer) for responses in three conditions: generated by 1) LLMs only, 2) humans only, and 3) humans working together with LLMs. <br/><br/>The project goal is to evaluate, with a replicable approach, whether LLMs ought to be applied to specific use cases within public services, and provide structured summaries of findings for decision-makers when weighing LLM adoption. Natural language processing methods are used to generate nationally representative synthetic prompts based on program rules and demographics across states (e.g., eligibility for different programs given different situations). The project will collect responses in the form of a next step or decision from three experimental conditions: a hypothetical LLM-only condition, a human-only condition, and a human supported by LLM condition. The LLM tool itself is fine-tuned on similar question-and-answer data regarding government services, and was created with partner organizations using retrieval augmented generation methods. The project relies upon gold-standard responses from quality control auditors to evaluate correctness. The answers from each condition will be analyzed for subgroup disparities based on prompt characteristics (e.g., types of employment, age group, housing circumstances), allowing for granular reporting of potential tradeoffs of LLM adoption.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Danielle F. Sumydsumy@nsf.gov7032924217
  • Min Amd Letter Date
    8/20/2024 - 11 months ago
  • Max Amd Letter Date
    8/20/2024 - 11 months ago
  • ARRA Amount

Institutions

  • Name
    Georgetown University
  • City
    WASHINGTON
  • State
    DC
  • Country
    United States
  • Address
    MAIN CAMPUS
  • Postal Code
    200570001
  • Phone Number
    2026250100

Investigators

  • First Name
    Allison
  • Last Name
    Koenecke
  • Email Address
    koenecke@cornell.edu
  • Start Date
    8/20/2024 12:00:00 AM
  • First Name
    Eric
  • Last Name
    Giannella
  • Email Address
    eg1009@georgetown.edu
  • Start Date
    8/20/2024 12:00:00 AM

Program Element

  • Text
    NSF-Ford Foundation Partnrshp