System for Cross-Language Information Processing, Translation and Summarization (SCRIPTS)

Abstract

This report describes the technical approaches and results for System for Cross-language Information Processing, Translation and Summarization (SCRIPTS) funded under the IARPA MATERIAL program. SCRIPTS consists of components for Automatic Speech Recognition (ASR) and Machine Translation (MT) in order to preprocess the text and speech corpora provided as part of the program. It also includes a text processing component that performs morphological analysis. In user-facing mode, given a query, SCRIPTS Cross-Language Information Retrieval (CLIR) returns relevant documents, while Summarization generates textual summaries of each document to help an analyst confirm which documents returned by CLIR are actually relevant. Over the course of program, the team implemented models for nine different languages: Somali, Swahili, Tagalog, Bulgarian, Lithuanian, Pashto, Farsi, Kazakh, and Georgian.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 22, 2021
Accession Number
AD1165721

Entities

People

  • David Wan
  • Dragomir Radev
  • Electra Wallington
  • Elsbeth Turcan
  • Faisal Ladhak
  • Julia Hirschberg
  • Kate Knill
  • Kathleen Mckeown
  • Kenneth Heafield
  • Mark Gales
  • Neha Verma
  • Ojdrej Klejch
  • Peter Bell
  • Ramy Eskander
  • Rui Zhang
  • Smaranda Muresan
  • Sukanta Sen
  • Susan Mcgregor
  • Svetlana Tshistiakova
  • Victor S. Martinez

Organizations

  • Columbia University

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Automata Theory
  • Bayesian Networks
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Programming
  • Computer Science
  • Computers
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Linguistics
  • Natural Language Processing
  • Neural Networks
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation