Disentangled learning representations of sonar data for target recognition and semantic compression

Abstract

Project Summary (Approved for Public Release)Machine learning enables state-of-the-art performance on computer vision tasks such asautomatic target detection, localization, and recognition. However, models produced by standard supervised machine learning methodsare not robust to contextual changes in the image distributions when trained on limited data. Context is the combination of factorsin the input that are not causally related to the desired model output, but they may be correlated with the desired model output ona task. The failure of models to perform well in the presence of contextual changes is due to the lack of representation in limitedtraining sets and spurious correlations between task relevant and task-irrelevant features. While methods to identify and correct for discrepancies between the training distribution and the validation distribution can partially alleviate this problem, they require sufficient validation data or appropriate data augmentation techniques. Wehypothesize that by explicitly modeling context - explaining both the task-irrelevant and task-relevant features using disentangled factors - will enable better task performance. Additionally, factorizing the content and context of images will enable more efficient data compression and generation. While disentangled factors are useful, this approach will be more useful if the factors are interpretable to humans by meaningful grounding in language and visual representations. Grounded factors enable human operators to dial in known information regarding context, and allow the cataloging of factors in datasets that are readily interpretable. Recently developed very large natural language models coupled withmodels for processing and generating images via techniques such as contrastive learning and transformer-based networks have enabledsystems for automatic image captioning and conditional image generation. The caveat is that these systems are trained on general image datasets and their applicability to specialized imagery, such as those produced by sonar, is questionable. Likewise general language is imprecise and unspecific to the descriptions necessary for specialized imagery or associated computer vision tasks. It may be necessary to fine-tune existing models, train lower-capacity models of the same general architecture on the smaller datasets, or develop methods to adapt large language models through retrofitting in order to be useful for semantic grounding in specialized domains.We propose to develop and evaluate novel methods to disentangle and explain factors, which will enable the following outcomes: Better surveys of the diversity of contextual factors in existing data sets. This can help pinpoint what contextual factors are underrepresented, overrepresented, and the correlations (possibly spurious) between contextual factors and task information. It will also enable a better understanding of the variation of factors in different data sets and the degreeto which there is inherent (not spurious) dependence between factors. Supervised machine learning built on top of the representations from models with interpretable contextual factors that are robust to contextual changes and can receive explicit input from human operators regarding context or queries that are user specified using natural language. Data compression based on the independent compression of disentangled factors and the subsequent reconstruction of imagery using optimized decoders. Intuitive conditional data generation based on the disentangled and explainable factors. In summary, novel methods to train models with explainable and disentangled representations will enable a virtuous cycle of dataset querying, data generation, and model refinement.

Document Details

Document Type: DoD Grant Award
Publication Date: Apr 11, 2024
Source ID: N000142412259

Entities

People

Austin J. Brockmeier

Organizations

Office of Naval Research
United States Navy
University of Delaware

Disentangled learning representations of sonar data for target recognition and semantic compression

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas