Authorship Attribution of Short Messages Using Multimodal Features

Abstract

In this thesis, we develop a multimodal classifier for authorship attribution of short messages. Standard natural language processing authorship attribution techniques are applied to a Twitter text corpus. Using character n-gram features and a Na ve Bayes classifier, we build statistical models of the set of authors. The social network of the selected Twitter users is analyzed using the screen names referenced in their messages. The timestamps of the messages are used to generate a pattern-of-life model. We analyze the physical layer of a network by measuring modulation characteristics of GSM cell phones. A statistical model of each cell phone is created using a Na ve Bayes classifier. Each phone is assigned to a Twitter user, and the probability outputs of the individual classifiers are combined to show that the combination of natural-language and network-feature classifiers identifies a user to phone binding better than when the individual classifiers are used independently.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2011
Accession Number
ADA543909

Entities

People

  • Sarah R. Boutwell

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Energy and Power Technologies
  • Sensors

DTIC Thesaurus Topics

  • Computational Science
  • Computer Languages
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Information Processing
  • Information Science
  • Machine Learning
  • Mobile Communications
  • Mobile Devices
  • Mobile Phones
  • Modulation
  • Natural Language Processing
  • Network Science
  • Supervised Machine Learning
  • Text Messaging

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computational Linguistics
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation