The Effects of Lexical Resource Quality on Preference Violation Detection

Abstract

Lexical resources such as WordNet and VerbNet are widely used in a multitude of NLP tasks, as are annotated corpora such as treebanks. Often, the resources are used as-is, without question or examination. This practice risks missing significant performance gains and even entire techniques. This paper addresses the importance of resource quality through the lens of a challenging NLP task: detecting selectional preference violations. We present DAVID, a simple, lexical resource-based preference violation detector. With as is lexical resources, DAVID achieves anF1-measure of just 28.27%. When the resource entries and parser outputs for a small sample are corrected, however, the F1-measure on that sample jumps from 40% to 61.54%, and performance on other examples rises, suggesting that the algorithm becomes practical given refined resources. More broadly, this paper shows that resource quality matters tremendously, sometimes even more than algorithmic improvements.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 04, 2013
Accession Number
AD1144433

Entities

People

  • Jaime Carbonell
  • Jesse Dunietz
  • Lori Levin

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Linguistics
  • Computer Science
  • Concrete
  • Department Of Defense
  • Detection
  • Detectors
  • Language
  • Law
  • Linguistics
  • Military Research
  • Natural Language Processing
  • Natural Languages
  • Stress Tests
  • Test Sets
  • Text Processing

Readers

  • Computational Linguistics
  • Organizational Process Management (OPM).
  • Systems Analysis and Design