The Effects of Lexical Resource Quality on Preference Violation Detection

Abstract

Lexical resources such as WordNet and VerbNet are widely used in a multitude of NLP tasks, as are annotated corpora such as treebanks. Often, the resources are used as-is, without question or examination. This practice risks missing significant performance gains and even entire techniques. This paper addresses the importance of resource quality through the lens of a challenging NLP task: detecting selectional preference violations. We present DAVID, a simple, lexical resource-based preference violation detector. With as is lexical resources, DAVID achieves anF1-measure of just 28.27%. When the resource entries and parser outputs for a small sample are corrected, however, the F1-measure on that sample jumps from 40% to 61.54%, and performance on other examples rises, suggesting that the algorithm becomes practical given refined resources. More broadly, this paper shows that resource quality matters tremendously, sometimes even more than algorithmic improvements.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 04, 2013
Accession Number: AD1144433

Entities

People

Jaime Carbonell
Jesse Dunietz
Lori Levin

Organizations

Carnegie Mellon University

The Effects of Lexical Resource Quality on Preference Violation Detection

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers