Domain and Language Evaluation Results

Abstract

The Fifth Message Understanding Conference (MUC-5) focused on the task of data extraction for two distinctly different applications, one within the domain of joint ventures (JV) and the other within the domain of microelectronics (ME) . For each application, the task could be performed in either English and/or Japanese, giving four combinations : English Joint Ventures, Japanese Joint Ventures, English Microelectronics, and Japanese Microelectronics . Interpreting the evaluation results across domains and within a single domain between languages is affected d by a number of factors. Differences in task focus, complexity, and domain technicality make it impossible to apply inferential statistics between domains . In addition, even though the task and the template design were the same across languages within a single domain, differences in the types of text sources for each language and accompanying variations in template fills and fill rules by language also make it impossible to apply inferential statistics between the language pairs . Moreover, there is considerable variation in the participants' level of effort and funding, and not all of the participants worked in multiple languages and/or multiple domains . In light of these factors, I will present descriptive statistics comparing error per response fill to address the following questions : (1) For both languages, what is the performance difference between domains? (2) Between domains, what are performance differences for the single shared object and for unattempted slots? (3) For both domains, what is the performance difference between languages? (4) For a single domain, what are representative differences at object and slot levels between English and Japanese? The discussion of domain and language difference s will center upon general factors that influence performance in information extraction.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1993
Accession Number
ADA460578

Entities

People

  • Mary E. Okurowski

Organizations

  • United States Department of Defense

Tags

Communities of Interest

  • Advanced Electronics

DTIC Thesaurus Topics

  • Availability
  • Data Management
  • Department Of Defense
  • Descriptive Analytics
  • Extraction
  • Frequency
  • Identification
  • Information Operations
  • Language
  • Lithography
  • Microelectronics
  • Packaging
  • Recognition
  • Statistics
  • Template Patterns
  • Test And Evaluation
  • Test Sets

Readers

  • Joint Military Operations and Doctrine.
  • Regression Analysis.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks
  • Microelectronics