Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata

Abstract

Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with nonoffending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2010
Accession Number
ADA514680

Entities

People

  • Andrew G. West
  • Insup Lee
  • Sampath Kannan

Organizations

  • University of Pennsylvania

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Classification
  • Computational Complexity
  • Electronic Mail
  • Geolocation
  • Information Science
  • Language
  • Machine Learning
  • Metadata
  • Natural Language Processing
  • Natural Languages
  • Network Protocols
  • Pennsylvania
  • Precision
  • Supervised Machine Learning
  • United States

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Educational Psychology
  • Military Logistics and Supply Chain Management