The 2019 BBN Cross-lingual Information Retrieval System

Abstract

In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is fine tuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR output, together with multiple translation renderings, are selected and translated into English snippets via a summarization model. Our turnkey system is language agnostic and can be quickly trained for a new low-resource language in few days.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 11, 2020
Accession Number
AD1149901

Entities

People

  • Damianos Karakos
  • David Akodes
  • John Makhoul
  • Le Zhang
  • Lee Tarlin
  • Lingjun Zhao
  • Manaj Srivastava
  • Numra Bathool
  • Richard Schwartz
  • Sanjay Krishna Gouda
  • William Hartmann
  • Zhuolin Jiang

Organizations

  • RTX

Tags

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computers
  • False Alarms
  • Foreign Languages
  • Information Processing
  • Information Retrieval
  • Information Systems
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Neural Networks
  • Probabilistic Models
  • Probability
  • Signal Processing

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks