Extraction of Key Words from News Stories

Abstract

In this work, we consider the task of extracting key-words such as key-players, key-locations, key-nouns and key-verbs from news stories. We cast this problem as a classification problem wherein we assign appropriate labels to each word in a news story. We considered statistical models such as naive Bayes model, hidden Markov model and maximum entropy model in our work. We have also experimented with various features. Our results indicate that a maximum entropy model that ignores contextual features and considers only word-based features combined with stopping and stemming yields the best performance. We found that extraction of keyverbs and key-nouns is a much harder problem than extracting keyplayers and key-locations.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA477769

Entities

People

  • James Allan
  • Ramesh Nallapati
  • Sridhar Mahadevan

Organizations

  • University of Massachusetts Amherst

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Classification
  • Computer Science
  • Computers
  • Extraction
  • Frequency
  • Genetic Algorithms
  • Hidden Markov Models
  • Information Retrieval
  • Machine Learning
  • Markov Models
  • Models
  • New York
  • Probability
  • Supervised Machine Learning
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Educational Psychology
  • Statistical inference.