The Effects of Lexical Resource Quality on Preference Violation Detection
Abstract
Lexical resources such as WordNet and VerbNet are widely used in a multitude of NLP tasks, as are annotated corpora such as treebanks. Often, the resources are used as-is, without question or examination. This practice risks missing significant performance gains and even entire techniques. This paper addresses the importance of resource quality through the lens of a challenging NLP task: detecting selectional preference violations. We present DAVID, a simple, lexical resource-based preference violation detector. With as is lexical resources, DAVID achieves anF1-measure of just 28.27%. When the resource entries and parser outputs for a small sample are corrected, however, the F1-measure on that sample jumps from 40% to 61.54%, and performance on other examples rises, suggesting that the algorithm becomes practical given refined resources. More broadly, this paper shows that resource quality matters tremendously, sometimes even more than algorithmic improvements.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 04, 2013
- Accession Number
- AD1144433
Entities
People
- Jaime Carbonell
- Jesse Dunietz
- Lori Levin
Organizations
- Carnegie Mellon University