Representing Videos using Mid-level Discriminative Patches

Abstract

How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 23, 2013
Accession Number: AD1175056

Entities

People

Abhinav Gupta
Arpit Jain
Larry S. Davis
Mikel Rodriguez

Organizations

Carnegie Mellon University
MITRE Corporation
University of Maryland

Representing Videos using Mid-level Discriminative Patches

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers