Representing Videos using Mid-level Discriminative Patches

Abstract

How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 23, 2013
Accession Number
AD1175056

Entities

People

  • Abhinav Gupta
  • Arpit Jain
  • Larry S. Davis
  • Mikel Rodriguez

Organizations

  • Carnegie Mellon University
  • MITRE Corporation
  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Bayesian Networks
  • Classification
  • Clustering
  • Computer Vision
  • Consistency
  • Detection
  • Detectors
  • Event Detection
  • Information Science
  • Integer Programming
  • Machine Learning
  • Object Recognition
  • Pattern Recognition
  • Recognition
  • Supervised Machine Learning
  • Three Dimensional

Fields of Study

  • Computer science

Readers

  • Computer Vision.