Feature Quantization and Pooling for Videos

Abstract

Building video representations typically involves four steps: feature extraction quantization, encoding, and pooling. While there have been large advances in feature extraction and encoding, the questions of how to quantize video features and what kinds of regions to pool them over have been relatively unexplored. To tackle the challenges present in video data, it is necessary to develop robust quantization and pooling methods. The first contribution of this thesis, Source Constrained Clustering, quantizes features into a codebook that generalizes better across actions. The main insight is to incorporate readily available labels of the sources generating the data. Sources can be the people who performed each cooking recipe, the directors who made each movie, or the YouTube users who shared their videos. In the pooling step, it is common to pool feature vectors over local regions. The regions of choice include the entire video, coarse spatio-temporal pyramids or cuboids of pre-determined fixed size. A consequence of using indiscriminately chosen cuboids is that widely dissimilar features may be pooled together if they are in nearby locations. It is natural to consider pooling video features over supervoxels for example, obtained from a video segmentation. However, since videos can have a different number of supervoxels, this produces a video representation of variable size. The second contribution of this thesis is a new, fixed size video representation Motion Words, where we pool features over video segments. The ultimate goal of video segmentation is to recover object boundaries, often grouping pixels from regions of very different motion. However, in the context of Motion Words, it is important that regions preserve motion boundaries. The third contribution of this thesis is a supervoxel segmentation, Globally Consistent Supervoxels which respects motion boundaries and provides better spatio-temporal support for Motion Words.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: May 01, 2014
Accession Number: ADA613814

Entities

People

Ekaterina Taralova

Organizations

Carnegie Mellon University

Feature Quantization and Pooling for Videos

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas