Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

Abstract

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantic strained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two large movie description datasets showing significant improvements in grammaticality while modestly improving descriptive quality.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 29, 2016
Accession Number
AD1049685

Entities

People

  • Kate Saenko
  • Lisa Anne Hendricks
  • Raymond Mooney
  • Subhashini Venugopalan

Organizations

  • National Science Foundation

Tags

DTIC Thesaurus Topics

  • Artificial Intelligence Computing
  • Artificial Intelligence Software
  • Artificial Neural Networks
  • Coding
  • Computational Science
  • Computer Languages
  • Computer Vision
  • Decoding
  • Embedding
  • Grammars
  • Images
  • Language
  • Learning
  • Machine Translation
  • Natural Language Computing
  • Natural Languages
  • Networks
  • Neural Nets
  • Neural Networks
  • Recurrent Neural Networks
  • Sequences
  • Test And Evaluation
  • Training
  • Translations
  • Vector Spaces
  • Video
  • Vocabulary

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computational Linguistics
  • Distributed Systems and Data Platform Development