A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching (Open Access)

Abstract

The problem of describing images through natural language has gained importance in the computer vision community. Solutions to image description have either focused on a top-down approach of generating language through combinations of object detections and language models or bottom-up propagation of keyword tags from training images to test images through probabilistic or nearest neighbor techniques. In contrast, describing videos with natural language is a less studied problem. In this paper, we combine ideas from the bottom-up and top-down approaches to image description and propose a method for video description that captures the most relevant contents of a video in a natural language description. We propose a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions. We compare the results of our system to human descriptions in both short and long forms on two datasets, and demonstrate that final system output has greater agreement with the human descriptions than any single level.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 03, 2013
Accession Number
AD1037264

Entities

People

  • Chenliang Xu
  • Jason J. Corso
  • Pradipto Das
  • Richard F. Doell

Organizations

  • University at Buffalo

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Automated Text Summarization
  • Computational Science
  • Computer Languages
  • Computer Science
  • Computer Vision
  • Data Sets
  • Detection
  • Detectors
  • Event Detection
  • Health Care
  • Language
  • Markov Models
  • Natural Language Processing
  • Natural Languages
  • Pattern Recognition
  • Recognition
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation