Sentimental Spidering

Abstract

Despite the increased prevalence of sentiment-related information on the Web, there has been limited work on focused crawlers capable of effectively collecting not only topic-relevant but also sentiment-relevant content. In this article, we propose a novel focused crawler that incorporates topic and sentiment information as well as a graph-based tunneling mechanism for enhanced collection of opinion-rich Web content regarding a particular topic. The graph-based sentiment (GBS) crawler uses a text classifier that employs both topic and sentiment categorization modules to assess the relevance of candidate pages. This information is also used to label nodes in web graphs that are employed by the tunneling mechanism to improve collection recall. Experimental results on two test beds revealed that GBS was able to provide better precision and recall than seven comparison crawlers. Moreover, GBS was able to collect a large proportion of the relevant content after traversing far fewer pages than comparison methods. GBS outperformed comparison methods on various categories of Web pages in the test beds, including collection of blogs, Web forums, and social networking Web site content. Further analysis revealed that both the sentiment classification module and graph-based tunneling mechanism played an integral role in the overall effectiveness of the GBS crawler.

Document Details

Document Type
Pub Defense Publication
Publication Date
Nov 01, 2012
Source ID
10.1145/2382438.2382443

Entities

People

  • Ahmed Ali Abbasi
  • Daniel Zeng
  • Hsinchun Chen
  • Tianjun Fu

Organizations

  • Chinese Academy of Sciences
  • Defense Threat Reduction Agency
  • Division of Chemical, Bioengineering, Environmental, and Transport Systems
  • Division of Computer and Network Systems
  • Division of Information and Intelligent Systems
  • National Health and Family Planning Commission
  • National Natural Science Foundation of China
  • University of Arizona
  • University of Virginia

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Distributed Systems and Data Platform Development
  • Information Retrieval