Data Synthesis of Complex Spatial Relationships for Visual Reasoning

Abstract

Visual reasoning for classifying spatial relationships has become a popular topic of research in recent machine learning studies. Simple spatial relationships (SSRs), or spatial relationships between two objects, are often well represented in visual question answering (VQA) datasets used for training visual reasoning models. Complex spatial relationships (CSRs) however, or spatial relationships that are combinations of SSRs, are not generally well represented due to the randomness of object layout during dataset generation. One such dataset, CLEVR, is the inspiration for many recent VQA datasets. By introducing a CSR called the aligned relationship, the research presented here seeks to improve on the limitation of these datasets with a model that parameterizes stochastic object placement. This model aids in VQA dataset generation by allowing control of the probability that CSRs will be expressed. Working with the CLEVR generation tool, this work also shows that datasets generated with certain distributions of these probabilities can be used to improve visual reasoning models.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Feb 15, 2024
Accession Number: AD1222256

Entities

People

Christopher J. Michael
Joshua T. Hughes

Organizations

United States Naval Research Laboratory

Data Synthesis of Complex Spatial Relationships for Visual Reasoning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas