Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect

Abstract

We report on our experience with building a statistical MT system from scratch, including the creation of a small parallel Tamil-English corpus, and the results of a taskbased pilot evaluation of statistical MT systems trained on sets of ca. 1300 and ca. 5000 parallel sentences of Tamil and English data. Our results show that even with apparently incomprehensible system output, humans without any knowledge of Tamil can achieve performance rates as high as 86% accuracy for topic identification, 93% recall for document retrieval, and 64% recall on question answering (plus an additional 14% partially correct answers).

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2001
Accession Number: ADA460337

Entities

People

Ulrich Germann

Organizations

University of Southern California

Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas