System for Cross-Language Information Processing, Translation and Summarization (SCRIPTS)
Abstract
This report describes the technical approaches and results for System for Cross-language Information Processing, Translation andSummarization (SCRIPTS) funded under the IARPA MATERIAL program. SCRIPTS consists of components for AutomaticSpeech Recognition (ASR) and Machine Translation (MT) in order to preprocess the text and speech corpora provided as part ofthe program. It also includes a text processing component that performs morphological analysis. In user-facing mode, given aquery, SCRIPTS Cross-Language Information Retrieval (CLIR) returns relevant documents, while Summarization generatestextual summaries of each document to help an analyst confirm which documents returned by CLIR are actually relevant. Over thecourse of program, the team implemented models for nine different languages: Somali, Swahili, Tagalog, Bulgarian, Lithuanian,Pashto, Farsi, Kazakh, and Georgian.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 22, 2021
- Accession Number
- AD1190088
Entities
People
- Kathleen Mckeown
Organizations
- Columbia University