A Categorial Variation Database for English

Abstract

We describe our approach to the construction and evaluation of a large-scale database called "CatVar" which contains categorial variations of English lexemes. Due to the prevalence of crosslanguage categorial variation in multilingual applications our categorial variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine- translation, lexicon construction, and information-retrieval communities. We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorical database achieves a high degree of precision and recall. Additionally, we demonstrate that the database improves on the linkability of Porter Stemmer by over 30/%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA455167

Entities

People

  • Bonnie J. Dorr
  • Nizar Habash

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Automated Text Summarization
  • Computational Linguistics
  • Computational Science
  • Construction
  • Databases
  • Demographic Cohorts
  • Information Retrieval
  • Language
  • Linguistics
  • Machine Translation
  • Markov Models
  • Models
  • Natural Language Processing
  • Natural Languages
  • Precision
  • Translations
  • Universities

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Geospatial Intelligence and Artificial Intelligence Analytics

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation