Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach (Open Access)

Abstract

We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Mar 03, 2014
Accession Number: AD1037599

Entities

People

Arash Vahdat
Greg Mori
Ilseo Kim
Kevin Cannons
Sangmin Oh

Organizations

Simon Fraser University

Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach (Open Access)

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas