Referential Grounding in Multimodal Machine Translation

Abstract

This project aimed to advance the state of the art in multimodal machine translation (MMT). MMT is an area where a text in the source language is supplemented by visual information (images or video) to be used as additional context to better understand and translate the text into a target language. The core of the advances proposed are on referential grounding, i.e., on guiding the alignment between image regions and source (and/or target) words such that the visual context can be more useful for translation.Work done during the project in covered the following directions:1. Improving supervised attention mechanisms to map source or target words to image regions, addressing both attention at encoding time (i.e. learning alignments between source words and objects in the image) and at decoding time (i.e. learning alignments between target words and objects in the image), as well as improving the underlying multimodal neural machine translation architectures and fusion strategies to use such information and exploring more recent and better types of visual features.2. Leveraging information from multiple vision-and-language tasks and datasets to improve multilingual grounding. 3. Creating resources to facilitate work on referential grounding.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 22, 2022
Accession Number: AD1194121

Entities

People

Lucia Specia

Organizations

Imperial College London

Referential Grounding in Multimodal Machine Translation

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas