Collaborative Research: Inverse Task Planning from Few-Shot Vision Language Demonstrations

Information

NSF Award
2327973

Owner

Cornell University

Award Id
2327973
Award Effective Date
1/1/2024 - 5 months ago
Award Expiration Date
12/31/2026 - 2 years from now
Award Amount
$ 507,288.00
Award Instrument
Standard Grant

Information

Collaborative Research: Inverse Task Planning from Few-Shot Vision Language Demonstrations

This project advances national prosperity and welfare by taking a step towards making collaborative robots more accessible across various settings, such as homes, factories, and logistics operations. Today, robots can only be programmed by engineers, limiting the tasks they can do to a narrow set of design choices made by these engineers. For robots to be widely adopted in households, they need to be versatile enough to tackle a much broader range of tasks. Many of these tasks, such as preparing meals or organizing the home, require personalization to the individual user's needs and preferences. This project aims to enable a robot to learn such personalized tasks through natural interactions with the user who may provide visual demonstrations combined with language narration to describe the task. Neither vision nor language alone is perfect, but together, they capture the task that the user wants to convey. This research leverages the broad impact Large Language Model (LLM) interfaces, such as ChatGPT, has had on engaging with everyday users and brings that into the physical realm with robots. It addresses fundamental challenges like summarizing vision-language demonstrations, verifying automatically generated plans and efficiently solving tasks that require many steps to achieve a goal. If successful, the project could transform many consumer robotics applications, enabling robots to be more usable, personalized and aligned with user values.<br/><br/>The primary objective of this project is to develop a framework for learning complex, long-horizon tasks from few-shot vision-language demonstrations. While existing approaches in Inverse Reinforcement Learning (IRL) enable learning simple, short-horizon skills from demonstrations, scaling these approaches to longer horizons with fewer demonstrations poses fundamental statistical and computational challenges. To address these challenges, a novel framework called Inverse Task Planning (ITP) that combines the generalization power of Large Language Models (LLMs) with performance guarantees of IRL to both efficiently and verifiably learn tasks will be used. This approach is uniquely different from existing work in LLM and task planning as it creates a closed-loop system to align LLM outputs with human demonstrations. Concretely, the plan is to: (1) parse vision-language demonstrations as robot state-action trajectories using visual question answering (2) learn language-based reward summaries from long-horizon state-action trajectories, and (3) optimize rewards by generating high-level task-code in a verifiable, closed-loop fashion. This research has broad implications for creating new interfaces that allow everyday users to program robots, developing courses on generative models and robotics, and providing immersive and engaging programming activities for K-12 students.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Cang Yecye@nsf.gov7032924702
Min Amd Letter Date
9/11/2023 - 8 months ago
Max Amd Letter Date
9/12/2023 - 8 months ago
ARRA Amount

Institutions

Name
Cornell University
City
ITHACA
State
NY
Country
United States
Address
341 PINE TREE RD
Postal Code
148502820
Phone Number
6072555014

Investigators

First Name
Sanjiban
Last Name
Choudhury
Email Address
sc2582@cornell.edu
Start Date
9/11/2023 12:00:00 AM

Program Element

Text
FRR-Foundationl Rsrch Robotics

Program Reference

Text
Artificial Intelligence (AI)

Text
ROBOTICS
Code
6840

Collaborative Research: Inverse Task Planning from Few-Shot Vision Language Demonstrations

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: Inverse Task Planning from Few-Shot Vision Language Demonstrations

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Program Reference

Text

Text

Code