Mid-level Features Improve Recognition of Interactive Activities

Abstract

We argue that mid-level representations can bridge the gap between existing low-level models, which are incapable of capturing the structure of interactive verbs, and contemporary high-level schemes, which rely on the output of potentially brittle intermediate detectors and trackers. We develop a novel descriptor based on generic object foreground segments our representation forms a histogram-of-gradient representation that is grounded to the frame of detected key-segments. Importantly, our method does not require objects to be identi ed reliably in order to compute a ro- bust representation. We evaluate an integrated system including novel key-segment activity descriptors on a large-scale video dataset containing 48 common verbs, for which we present a comprehensive evaluation protocol. Our results con rm that a descriptor de ned on mid-level primitives operating at a higher-level than local spatio-temporal features, but at a lower-level than trajectories of detected objects, can provide a substantial improvement relative to either alone or to their combination.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 14, 2012
Accession Number: ADA570728

Entities

People

Ben Packer
Chun-Hui Chen
Daniel Koller
Fei-Fei Li
J. Niebles
K. Grauman
Kate Saenko
S. Bandla
Trevor Darrell
Y. Lee
Yangqing Jia

Organizations

University of California, Berkeley

Mid-level Features Improve Recognition of Interactive Activities

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers