Novel Topic Impact on Authorship Attribution

Abstract

Several authorship attribution studies have speculated about the existence of a link between topic cues and author style features. This research presents a novel experimental protocol for measuring the impact of topic features on author attribution predictive models. We call our technique "novel topic crossvalidation," which consists of holding out a single topic in a test set and iterating over choices of held-out topic to compute an average performance score. Using the New York Times Annotated corpus, we perform a subset procedure to build a sub-corpus of 18,862 documents, 15 authors, and 23 topics. With this sub-corpus, we perform a novel topic crossvalidation. Our experiments differ from previous attempts to model topic/author influence in scope; previous methods were limited to three or fewer topics or authors. Having a larger set of topics and authors should provide researchers with a greater opportunity to explore the variability of style cues represented in sets of authors, as well as the confounding influence of topic. For this reason, we supply document/author/topic identifications so that researchers can build upon our work in a reproducible fashion.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2009
Accession Number
ADA514246

Entities

People

  • Johnnie F. Caver

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Cognitive Science
  • Computational Science
  • Databases
  • Dimensionality Reduction
  • Identification
  • Information Science
  • Machine Learning
  • Mathematical Analysis
  • Natural Language Processing
  • Natural Languages
  • Network Science
  • New York
  • Predictive Modeling
  • Probability Distributions
  • Statistical Analysis
  • Supervised Machine Learning
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Library and Information Science
  • Neural Network Machine Learning.