Feature Quantization and Pooling for Videos

Abstract

Building video representations typically involves four steps: feature extraction quantization, encoding, and pooling. While there have been large advances in feature extraction and encoding, the questions of how to quantize video features and what kinds of regions to pool them over have been relatively unexplored. To tackle the challenges present in video data, it is necessary to develop robust quantization and pooling methods. The first contribution of this thesis, Source Constrained Clustering, quantizes features into a codebook that generalizes better across actions. The main insight is to incorporate readily available labels of the sources generating the data. Sources can be the people who performed each cooking recipe, the directors who made each movie, or the YouTube users who shared their videos. In the pooling step, it is common to pool feature vectors over local regions. The regions of choice include the entire video, coarse spatio-temporal pyramids or cuboids of pre-determined fixed size. A consequence of using indiscriminately chosen cuboids is that widely dissimilar features may be pooled together if they are in nearby locations. It is natural to consider pooling video features over supervoxels for example, obtained from a video segmentation. However, since videos can have a different number of supervoxels, this produces a video representation of variable size. The second contribution of this thesis is a new, fixed size video representation Motion Words, where we pool features over video segments. The ultimate goal of video segmentation is to recover object boundaries, often grouping pixels from regions of very different motion. However, in the context of Motion Words, it is important that regions preserve motion boundaries. The third contribution of this thesis is a supervoxel segmentation, Globally Consistent Supervoxels which respects motion boundaries and provides better spatio-temporal support for Motion Words.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2014
Accession Number
ADA613814

Entities

People

  • Ekaterina Taralova

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Bayesian Networks
  • Computer Science
  • Computer Vision
  • Dimensionality Reduction
  • Event Detection
  • Feature Extraction
  • Image Recognition
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Neural Networks
  • Pattern Recognition
  • Supervised Machine Learning

Readers

  • Image Processing and Computer Vision.
  • Regression Analysis.
  • Systems Analysis and Design

Technology Areas

  • AI & ML