Data Synthesis of Complex Spatial Relationships for Visual Reasoning

Abstract

Visual reasoning for classifying spatial relationships has become a popular topic of research in recent machine learning studies. Simple spatial relationships (SSRs), or spatial relationships between two objects, are often well represented in visual question answering (VQA) datasets used for training visual reasoning models. Complex spatial relationships (CSRs) however, or spatial relationships that are combinations of SSRs, are not generally well represented due to the randomness of object layout during dataset generation. One such dataset, CLEVR, is the inspiration for many recent VQA datasets. By introducing a CSR called the aligned relationship, the research presented here seeks to improve on the limitation of these datasets with a model that parameterizes stochastic object placement. This model aids in VQA dataset generation by allowing control of the probability that CSRs will be expressed. Working with the CLEVR generation tool, this work also shows that datasets generated with certain distributions of these probabilities can be used to improve visual reasoning models.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 15, 2024
Accession Number
AD1222256

Entities

People

  • Christopher J. Michael
  • Joshua T. Hughes

Organizations

  • United States Naval Research Laboratory

Tags

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Computational Modeling and Simulation
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks