Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields

Abstract

We present an online approach to efficiently and simultaneously detect and track 2D poses of multiple people in a video sequence. We build upon Part Affinity Fields (PAF) representation designed for static images, and propose an architecture that can encode and predict Spatio-Temporal Affinity Fields (STAF) across a video sequence. In particular, we propose a novel temporal topology cross-linked across limbs which can consistently handle body motions of a wide range of magnitudes. Additionally, we make the overall approach recurrent in nature, where the network ingests STAF heatmaps from previous frames and estimates those for the current frame. Our approach uses only online inference and tracking, and is currently the fastest and the most accurate bottom-up approach that is runtime-invariant to the number of people in the scene and accuracy-invariant to input frame rate of camera.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 16, 2019
Accession Number
AD1152054

Entities

People

  • Gines Hidalgo
  • Haroon Idrees
  • Yaadhav Raaj
  • Yaser Sheikh

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Artificial Intelligence Software
  • Augmented Reality
  • Change Detection
  • Computations
  • Convolutional Neural Networks
  • Deep Learning
  • Detection
  • Estimators
  • Flow
  • Flow Fields
  • Identification
  • Image Recognition
  • Multiple Targets
  • Neural Networks
  • Person Tracking
  • Recognition
  • Sequences
  • Topology
  • Training
  • Urban Areas
  • Video

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computer Vision.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks