Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach (Open Access)

Abstract

We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 03, 2014
Accession Number
AD1037599

Entities

People

  • Arash Vahdat
  • Greg Mori
  • Ilseo Kim
  • Kevin Cannons
  • Sangmin Oh

Organizations

  • Simon Fraser University

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Classification
  • Commerce
  • Computations
  • Computer Vision
  • Detection
  • Event Detection
  • Histograms
  • Internet
  • Learning
  • Machine Learning
  • Recognition
  • Sequences
  • Standards
  • Supervised Machine Learning
  • Training

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development
  • Theoretical Analysis.

Technology Areas

  • AI & ML