System for Cross-Language Information Processing, Translation and Summarization (SCRIPTS)

Abstract

This report describes the technical approaches and results for System for Cross-language Information Processing, Translation and Summarization (SCRIPTS) funded under the IARPA MATERIAL program. SCRIPTS consists of components for Automatic Speech Recognition (ASR) and Machine Translation (MT) in order to preprocess the text and speech corpora provided as part of the program. It also includes a text processing component that performs morphological analysis. In user-facing mode, given a query, SCRIPTS Cross-Language Information Retrieval (CLIR) returns relevant documents, while Summarization generates textual summaries of each document to help an analyst confirm which documents returned by CLIR are actually relevant. Over the course of program, the team implemented models for nine different languages: Somali, Swahili, Tagalog, Bulgarian, Lithuanian, Pashto, Farsi, Kazakh, and Georgian.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 22, 2021
Accession Number: AD1165721

Entities

People

David Wan
Dragomir Radev
Electra Wallington
Elsbeth Turcan
Faisal Ladhak
Julia Hirschberg
Kate Knill
Kathleen Mckeown
Kenneth Heafield
Mark Gales
Neha Verma
Ojdrej Klejch
Peter Bell
Ramy Eskander
Rui Zhang
Smaranda Muresan
Sukanta Sen
Susan Mcgregor
Svetlana Tshistiakova
Victor S. Martinez

Organizations

Columbia University

System for Cross-Language Information Processing, Translation and Summarization (SCRIPTS)

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas