Reasoning About Human Object Interactions Through Dual Attention Networks

Abstract

Objects are entities we act upon, where the functionality of an object is determined by how we interact with it. In this work we propose a Dual Attention Network model which reasons about human-object interactions. The dual-attentional framework weights the important features for objects and actions respectively. As a result, the recognition of objects and actions mutually benefit each other. The proposed model shows competitive classification performance on the human-object interaction dataset Something-Something. Besides, it can perform weak spatiotemporal localization and affordance segmentation, despite being trained only with video-level labels. The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 27, 2019
Accession Number
AD1153063

Entities

People

  • Aude Oliva
  • Bolei Zhou
  • Dan Gutfreund
  • Mathew Monfort
  • Quanfu Fan
  • Tete Xiao

Organizations

  • International Business Machines Corporation (Armonk, NY)
  • Massachusetts Institute of Technology
  • The Chinese University of Hong Kong
  • University of California, Berkeley

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Bayesian Networks
  • Computer Languages
  • Computer Vision
  • Computers
  • Dimensionality Reduction
  • Image Recognition
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Neural Networks
  • Object Recognition
  • Pattern Recognition
  • Probabilistic Models
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.