Hierarchical Video Prediction using Relational Layouts for Human-Object Interactions

Abstract

Learning to model and predict how humans interact with objects while performing an action is challenging, and most of the existing video prediction models are ineffective in modeling complicated human-object interactions. Our work builds on hierarchical video prediction models, which disentangle the video generation process into two stages: predicting a high-level representation, such as pose sequence, and then learning a pose-to-pixels translation model for pixel generation. An action sequence for a human-object interaction task is typically very complicated, involving the evolution of pose, persons appearance, object locations, and object appearances over time. To this end, we propose a Hierarchical Video Prediction model using Relational Layouts. In the first stage, we learn to predict a sequence of layouts. A layout is a high-level representation of the video containing both pose and objects information for every frame. The layout sequence is learned by modeling the relationships between the pose and objects using relational reasoning and recurrent neural networks. The layout sequence acts as a strong structure prior to the second stage that learns to map the layouts into pixel space. Experimental evaluation of our method on two datasets, UMD-HOI and Bimanual, shows significant improvements in standard video evaluation metrics such as LPIPS, PSNR, and SSIM. We also perform a detailed qualitative analysis of our model to demonstrate various generalizations.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 19, 2021
Accession Number
AD1185797

Entities

People

  • Abhinav Shrivastava
  • Gaurav Shrivastava
  • Navaneeth Bodla
  • Rama Chellappa

Organizations

  • Johns Hopkins University
  • University of Maryland

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Science
  • Computer Graphics
  • Computer Vision
  • Computers
  • Convolutional Neural Networks
  • Image Processing
  • Information Processing
  • Information Systems
  • Machine Learning
  • Mobile Phones
  • Neural Networks
  • Pattern Recognition
  • Recognition
  • Recurrent Neural Networks
  • Standards
  • Test And Evaluation

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Space
  • Space - Space Objects