Blog Fingerprinting: Identifying Anonymous Posts Written by an Author of Interest Using Word and Character Frequency Analysis

Abstract

Internet blogs are an easily accessible means of global communications. Monitoring blogs for criminal and terrorist activity is a serious challenge, due to blogs' anonymous nature and the sheer volume of data. The intelligence community is often faced with more information than it can process. The need exists to develop methods for processing the massive amounts of data this media presents, without a significant increase in manpower. An automated tool capable of identifying posts written by an individual, given a sample of his writing, would allow law enforcement and intelligence agencies to gather evidence that would otherwise be overlooked due to manpower and time constraints. This research focuses on identifying blog posts written by a particular author, when we do not have a model of every potential author. Previous research either builds a distinct model for every possible author, or limits itself to large documents. Neither approach is appropriate for processing blog posts. Blog posts tend to be short documents, and building a distinct model of each author is unreasonable if you are looking for one author among millions. We address this problem by combining sample posts by other authors to create a model of an "average author."

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2009
Accession Number
ADA508981

Entities

People

  • David J. Dreier

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • California
  • Computational Linguistics
  • Computational Science
  • Information Processing
  • Information Retrieval
  • Information Science
  • Language
  • Law Enforcement
  • Linguistics
  • Machine Learning
  • Natural Language Processing
  • Network Science
  • Online Communications
  • Robotics
  • Students
  • Supervised Machine Learning
  • Terrorists

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computer Science.
  • Educational Psychology