Self-Supervised Learning in Synthetic Aperture Sonar Imagery with Vision Transformers

Abstract

Approved for Public Release. Synthetic Aperture Sonar (SAS) has enabled the collection of high-resolution imagery along the seabed floor. This availability of large amounts of high-resolution SAS imagery provides the opportunity to perform Automatic Target Recognition (ATR), however most ATR algorithms require an incredible amount of high-quality, pre-labeled data. To most effectively utilizethe large amounts of SAS imagery for ATR applications, we propose to utilize self-supervised learning approaches with state-of-the-art transformer models to mitigate the labeled data problem, and then fine-tune the pre-trained model on SAS imagery to perform ATR. The objectives of this proposed research include the design and implementation of a self-supervised vision transformer pipeline toenable robust ATR applications in SAS imagery. This pipeline will require the training of self-supervised computer vision models followed by the downstream fine-tuning of these models on limited amounts of labeled SAS imagery for ATR applications. Additionally, this pipeline will enable the evaluation of state-of-the-art transformer models and their performance characteristics on SAS imagery for ATR. The proposed technical approach for this research is the utilization of transformer deep neural networks. Transformer deepneural networks have shown excellent performance in a variety of machine learning domains, including Natural Language Processing (NLP) and Computer Vision (CV). The underlying technique employed by these transformers, known as attention mechanisms, allows the model to learn which parts of the input sequence, whether words or pixels, are most relevant to solving the machine learning task. Vision Transformers, the family of transformers models applied to imagery, were introduced in 2020, and represent images as a sequence of patches to serve as input to the transformer network. More recently, numerous high-performing vision transformer models have beenintroduced, nearly all of which struggle with high data availability and computational resource requirements to effectively train. To that end, in this proposed research, we will utilize self-supervised learning techniques with modern deep neural transformer networks to train general baseline models on un-labeled SAS imagery. Self-supervised learning techniques generally rely on the degradation of data in a known way, and then training a model to recover the original, non-degraded input. Following the baseline self-supervised training, models are fine-tuned on labeled data for a particular CV task and dataset, thus enabling high-performing CV models in applications without large amounts of labeled data. Further, by combining self-supervised learning with the latest advancements inefficient transformer architecture design, we plan to enable more efficient and scalable application of Vision Transformers for ATRin SAS imagery.

Document Details

Document Type
DoD Grant Award
Publication Date
Nov 08, 2024
Source ID
N000142412539

Entities

People

  • James Hurt

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Missouri System

Tags

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Electrical Engineering
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks