Linguistic Resource Creation for Research and Technology Development: A Recent Experiment

Abstract

Advances in statistical machine learning encourage language-independent approaches to linguistic technology development. Experiments in porting technologies to handle new natural languages have revealed a great potential for multilingual computing, but also a frustrating lack of linguistic resources for most languages. Recent efforts to address the lack of available resources have focused either on intensive resource development for a small number of languages or development of technologies for rapid porting. The Linguistic Data Consortium recently participated in an experiment falling primarily under the first approach, the surprise language exercise. This article describes linguistic resource creation within this context, including the overall methodology for surveying and collecting language resources, as well as details of the resources developed during the exercise. The article concludes with discussion of a new approach to solving the problem of limited linguistic resources, one that has recently proven effective in identifying core linguistic resources for less common studied languages. MACHINE TRANSLATION, LANGUAGE PARSING AND UNDERSTANDING, TEXT ANALYSIS, LINGUISTIC RESOURCES, HINDI, CEBUANO, TRANSLINGUAL INFORMATION ACCESS TECHNOLOGY, MACHINE TRANSLATION, CROSSLANGUAGE INFORMATION RETRIEVAL, INFORMATION EXTRACTION, SUMMARIZATION

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA458886

Entities

People

  • Christopher Cieri
  • Mike Maxwell
  • Stephanie Strassel

Organizations

  • University of Pennsylvania

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Automated Text Summarization
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Detection
  • Information Retrieval
  • Intellectual Property
  • Language
  • Machine Learning
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Standards
  • Websites

Fields of Study

  • Computer science
  • Education
  • Linguistics

Readers

  • Computational Linguistics
  • Defense Technology Research and Development.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Machine Translation