Maintaining Consistency and Relevancy in Multi-Image Visual Storytelling

Abstract

This report proposes an approach for visual storytelling across multiple images that prioritize two aspects of narrative generation: 1) the retention of narrative consistency between clauses in the generated story; and 2) the retention of relevancy between the generated story and the images from which it was derived. We take a structured approach to multi-image visual storytelling that centers around the middle image in a sequence of three. Acting as the focal point, or climax of the narrative, the plot points surrounding this are selected using events from the Atlas of Machine Commonsense (ATOMIC) corpus for if-then reasoning about daily activities, and then the selected events are subsequently grounded to the images. The result is an architecture that, given an author goal to guide the story in the form of a prompt, will generate a short narrative that retains a narrative arc and does not deviate from the content of the images.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 13, 2020
Accession Number
AD1115439

Entities

People

  • Aishwarya Sapkale
  • Stephanie M. Lukin

Organizations

  • United States Army Research Laboratory
  • University of Maryland, Baltimore County

Tags

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Linguistics
  • Computer Languages
  • Computer Science
  • Computer Vision
  • Computers
  • Consistency
  • Demographic Cohorts
  • Identification
  • Language
  • Linguistics
  • Military Research
  • Natural Language Processing
  • Natural Languages
  • Navigation
  • Neural Networks
  • Orientation (Direction)
  • Pattern Recognition
  • Personality
  • Psychology
  • Recognition

Readers

  • Artificial Intelligence
  • Computational Linguistics
  • Computer Vision.