YIP Learning Long-Horizon Dextrous Manipulation from Vision LanguageDemonstrations

Abstract

Approved for Public ReleaseTitle: Learning Long-Horizon Dextrous Manipulation from Vision Language DemonstrationsPI: Sanjiban Choudhury, Cornell University, Budget: $ 750,000Research Problem: Despite significant advances in robotics, current systems struggle withcomplex,long-horizon dexterous tasks, particularly in dynamic naval environments. Robots must performsophisticated manipulations like engine repairs, requiring them to interpret manuals, plantasks sequentially, and adapt to unforeseen errors. Current methods are limited, with some usinglanguage models for simple pick-and-place actions while others handling complex maneuvers butonly for specific, short-horizon tasks. These limitations highlight two key challenges: (1) CommonGrounding: A significant grounding gap arises from a mismatch between high-level taskplanning learned from human-generated vision-language data and low-level manipulation skillslearned from robot-specific sensor data. (2) Error Recovery: Dexterous manipulation is prone toerrors, necessitating robust end-to-end training architectures that can adapt and correct such errors.We propose a unified architecture that integrates long-horizon task planning with dexterousmanipulation, allowing robots to autonomously handle complex tasks in open-world environments.Technical Objectives: Our research is dedicated to developing a unified architecture for longhorizon,dexterous manipulation based on vision-language demonstrations, addressing significantgaps in robotics for complex tasks in open-world environments. Our primary goals include:1) Unified Task Planning and Manipulation Architecture: We propose a novel architecture thatintegrates a differentiable task planner with a dexterous manipulation policy. This involves learningskill tokens with end-to-end models translating human demonstrations into dexterous robot actions.2) Bridging the Common Grounding Gap: We propose to use language as a unifying element toalign high-level task planning from human vision-language data against low-level manipulationpolicies learned from sensor data. This alignment will minimize the grounding gap, ensuringeffective communication and action execution between the planner and manipulation policies.3) Error Recovery Capabilities: We will develop robust training methods enabling the task plannerand manipulation policy to detect, correct, and recover from errors. This will involve a variety offeedback mechanisms, including simulation trials and real-time human intervention.Expected Outcomes: We expect that succesfully meeting these objectives will greatly enhancerobot capabilites to perform complex autonomous manipulations, contributing to high-impact researchpublications and advanced robotic systems for Navy operations.Impact onNaval Capabilities: This project will significantly advance Navy capabilities by developinga unified architecture for long-horizon,dexterous manipulation, enabling robots to autonomouslyperform complex repairs and inspections onboard naval ships. By reducing theneedfor human presence in hazardous environments, our approach enhances safety and operational efficiency.The capability for autonomous repair and manipulation tasks, complemented by the abilityto adapt dynamically to unforeseen errors, ensures that naval operations maintain continuous functionalitywith minimal downtime. The outcomes of this project will be a fundamental step towardsdesigningautonomous systems capable of performing complex, dexterous manipulations, therebytransforming naval operations and enhancing the capability of autonomous naval systems.

Document Details

Document Type: DoD Grant Award
Publication Date: Jan 13, 2025
Source ID: N000142512086

Entities

People

Sanjiban Choudhury

Organizations

Cornell University
Office of Naval Research
United States Navy

YIP Learning Long-Horizon Dextrous Manipulation from Vision LanguageDemonstrations

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas