ICTNET at Web Track 2010 Spam Task

Abstract

Web Spamming refers those web pages deceive search engines so as to get a higher rank in their search result. We work on the data set TrecWeb09, based on a content-based spamming classifier, to check the two ends of a hyperlink; if the two end pages either is content spamming, or both are not so good, then the hyperlink will be discarded. After all hyperlinks have been checked, PageRank value shall be re-count on the re-built web network. The balance of one page's PageRank value will be regarded as its link spamming. Then the link spamming score and the result of content deceiving analyzer will be combined as the final estimation of one page's spamming.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2010
Accession Number
ADA546579

Entities

People

  • Bolong Zhu
  • Hongbo Xu
  • Jianguo Wang
  • Liang Zhu
  • Xiaoming Yu
  • Xu Chen
  • Xueqi Cheng
  • Yue Liu
  • Zeying Peng

Organizations

  • Chinese Academy of Sciences

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Applied Computer Science
  • Communication Networks
  • Computer Networks
  • Computer Science
  • Computing-Related Activities
  • Data Science
  • Data Sets
  • Detection
  • Electronic Mail
  • Machine Learning
  • Network Science
  • Networks
  • Spammers
  • Standards
  • Statistical Analysis
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Circadian Sleep-Wake Regulation and Chronobiology
  • Database Systems and Applications
  • Information Retrieval