Data Synthesis of Complex Spatial Relationships for Visual Reasoning
Abstract
Visual reasoning for classifying spatial relationships has become a popular topic of research in recent machine learning studies. Simple spatial relationships (SSRs), or spatial relationships between two objects, are often well represented in visual question answering (VQA) datasets used for training visual reasoning models. Complex spatial relationships (CSRs) however, or spatial relationships that are combinations of SSRs, are not generally well represented due to the randomness of object layout during dataset generation. One such dataset, CLEVR, is the inspiration for many recent VQA datasets. By introducing a CSR called the aligned relationship, the research presented here seeks to improve on the limitation of these datasets with a model that parameterizes stochastic object placement. This model aids in VQA dataset generation by allowing control of the probability that CSRs will be expressed. Working with the CLEVR generation tool, this work also shows that datasets generated with certain distributions of these probabilities can be used to improve visual reasoning models.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 15, 2024
- Accession Number
- AD1222256
Entities
People
- Christopher J. Michael
- Joshua T. Hughes
Organizations
- United States Naval Research Laboratory