Automated Story Capture From Internet Weblogs

Abstract

Among the most interesting ways that people share knowledge is through the telling of stories, i.e. first-person narratives about real-life experiences. Millions of these stories appear in Internet weblogs, offering a potentially valuable resource for future knowledge management and training applications. In this paper we describe efforts to automatically capture stories from Internet weblogs by extracting them using statistical text classification techniques. We evaluate the precision and recall performance of competing approaches. We describe the large-scale application of story extraction technology to Internet weblogs, producing a corpus of stories with over a billion words.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2007
Accession Number
ADA470419

Entities

People

  • Andrew S. Gordon
  • Qun Cao
  • Reid Swanson

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Classification
  • Computer Languages
  • Computers
  • Data Science
  • Dimensionality Reduction
  • Extraction
  • Governments
  • Information Science
  • Knowledge Management
  • Learning
  • Machine Learning
  • Natural Languages
  • Precision
  • Supervised Machine Learning
  • United States Government

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Educational Psychology
  • Team-Based Human-Centered Cognitive Task Decision Making and Information Performance.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks