Turning Down the Noise in the Blogosphere

Abstract

In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere. We define a simple and elegant notion of coverage and formalize it as a submodular optimization problem, for which we can efficiently compute a near-optimal solution. In addition, since people have varied interests, the ideal coverage algorithm should incorporate user preferences in order to tailor the selected posts to individual tastes. We define the problem of learning a personalized coverage function by providing an appropriate user-interaction model and formalizing an online learning framework for this task. We then provide a no-regret algorithm which can quickly learn a user's preferences from limited feedback. We evaluate our coverage and personalization algorithms extensively over real blog data. Results from a user study show that our simple coverage algorithm does as well as most popular blog aggregation sites, including Google Blog Search, Yahoo! Buzz, and Digg. Furthermore, we demonstrate empirically that our algorithm can successfully adapt to user preferences. We believe that our technique, especially with personalization, can dramatically reduce information overload.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2009
Accession Number
ADA501771

Entities

People

  • Carlos Guestrin
  • Dafna Shahaf
  • Gaurav Veda
  • Khalid El-arini

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Commerce
  • Computational Science
  • Computer Science
  • Data Sets
  • Homosexuality
  • Information Overload
  • Information Retrieval
  • Machine Learning
  • Models
  • Monte Carlo Method
  • Network Science
  • Online Communications
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Websites

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Educational Psychology