Natural Language Video Description using Deep Recurrent Neural Networks

Abstract

For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a very complex problem. The goal of my research is to develop models that can automatically generate natural language (NL) descriptions for events in videos. As a first step, this proposal presents deep recurrent neural network models for video to text generation. I build on recent deep machine learning approaches to develop video description models using a unified deep neural network with both convolutional and recurrent structure. This technique treats the video domain as another language and takes a machine translation approach using the deep network to translate videos to text. In my initial approach, I adapt a model that can learn on images and captions to transfer knowledge from this auxiliary task to generate descriptions for short video clips. Next, I present an end-to-end deep network that can jointly model a sequence of video frames and a sequence of words. The second part of the proposal outlines a set of models to significantly extend work in this area. Specifically, I propose techniques to integrate linguistic knowledge from plain text corpora; and attention methods to focus on objects and track their interactions to generate more diverse and accurate descriptions. To move beyond short video clips, I also outline models to process multi-activity movie videos, learning to jointly segment and describe coherent event sequences. I propose further extensions to take advantage of movie scripts and subtitle information to generate richer descriptions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 23, 2015
Accession Number
AD1024592

Entities

People

  • Subhashini Venugopalan

Organizations

  • University of Texas at Austin

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Architecture (Building Design)
  • Artificial Intelligence Software
  • Artificial Neural Networks
  • Automated Speech Recognition
  • Boundaries
  • Coding
  • Computational Science
  • Computer Languages
  • Computer Programming
  • Computer Vision
  • Convolutional Neural Networks
  • Data Mining
  • Detection
  • Dimensionality Reduction
  • Feature Extraction
  • Information Science
  • Information Systems
  • Language
  • Machine Learning
  • Machine Translation
  • Natural Language Computing
  • Natural Language Processing
  • Natural Languages
  • Neural Networks
  • Ontologies
  • Pattern Recognition
  • Probability
  • Probability Distributions
  • Recognition
  • Recurrent Neural Networks
  • Statistics
  • Supervised Machine Learning
  • Training
  • Video Frames

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks