Semi-supervised Learning of Multimodal Representations

Abstract

The goal of this project was to create better representations of words by improving vector space language models with multimodal and multilingual information. A large-scale dataset of multilingual images, called MMID, was assembled, it associates images with words for 98 different languages (up to 10K words for each language, with 100 images per word). This dataset let us perform a comprehensive analysis of whether visual similarity could be used to identify translations, and the extent to which this is affected by linguistic factors like part of speech and concreteness. We studied whether MMID could be used to mitigate the geographical bias in image classification datasets like ImageNet (for example wedding is visually distinct in different regions of the world). The extent to which geography impacts the translatability across pairs of languages was investigated; factors such as shared language families, ethnic groups or shared religions have a larger impact than geography on the visual similarity and therefore translatability via images. We also collected a dataset from Wikipedia by aggregating shared images with multilingual captions giving us full sentences rather than the individual words in MMID.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 16, 2022
Accession Number
AD1177263

Entities

People

  • Chris Callison-burch
  • Derry Wijaya

Organizations

  • University of Pennsylvania

Tags

Communities of Interest

  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Cognitive Science
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Vision
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Linguistics
  • Machine Learning
  • Natural Language Processing
  • Natural Languages
  • Neural Networks
  • Supervised Machine Learning

Readers

  • Artificial Intelligence
  • Computer Vision.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • Space