Mid-level Features Improve Recognition of Interactive Activities
Abstract
We argue that mid-level representations can bridge the gap between existing low-level models, which are incapable of capturing the structure of interactive verbs, and contemporary high-level schemes, which rely on the output of potentially brittle intermediate detectors and trackers. We develop a novel descriptor based on generic object foreground segments our representation forms a histogram-of-gradient representation that is grounded to the frame of detected key-segments. Importantly, our method does not require objects to be identi ed reliably in order to compute a ro- bust representation. We evaluate an integrated system including novel key-segment activity descriptors on a large-scale video dataset containing 48 common verbs, for which we present a comprehensive evaluation protocol. Our results con rm that a descriptor de ned on mid-level primitives operating at a higher-level than local spatio-temporal features, but at a lower-level than trajectories of detected objects, can provide a substantial improvement relative to either alone or to their combination.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 14, 2012
- Accession Number
- ADA570728
Entities
People
- Ben Packer
- Chun-Hui Chen
- Daniel Koller
- Fei-Fei Li
- J. Niebles
- K. Grauman
- Kate Saenko
- S. Bandla
- Trevor Darrell
- Y. Lee
- Yangqing Jia
Organizations
- University of California, Berkeley