Entity Retrieval by Hierarchical Relevance Model, Exploiting the Structure of Tables and Learning Homepage Classifiers

Abstract

This paper gives an overview of our work done for the TREC 2009 Entity track. We propose a hierarchical relevance retrieval model for entity ranking. In this model, three levels of relevance are examined which are document, passage and entity, respectively. The final ranking score is a linear combination of the relevance scores from the three levels. Furthermore, we exploit the structure of tables and lists to identify the target entities from them by making a joint decision on all the entities with the same attribute. To find entity homepages, we train logistic regression models for each type of entities. A set of templates and filtering rules are also used to identify target entities. The key lessons that we learned by participating this year's Entity track include: 1) our special treatment of table and list data is well rewarding; 2) The high accuracy of homepage finding is crucial in this track; 3) Wikipedia can serve as a valuable knowledge resource for different aspects of the related entity finding task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA517741

Entities

People

  • Luo Si
  • Yangbo Xu
  • Yantuan Xian
  • Yi Fang
  • Zhengtao Yu

Organizations

  • Purdue University

Tags

Communities of Interest

  • Air Platforms
  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Abstracts
  • Accuracy
  • Classification
  • Computer Science
  • Extraction
  • Filtration
  • Language
  • Learning
  • Machine Learning
  • Named Entity Recognition
  • Natural Languages
  • Probability
  • Recognition
  • Standards
  • Template Patterns
  • Training
  • Universities

Fields of Study

  • Computer science

Readers

  • Information Retrieval
  • Regression Analysis.
  • Systems Analysis and Design