Novel Topic Impact on Authorship Attribution
Abstract
Several authorship attribution studies have speculated about the existence of a link between topic cues and author style features. This research presents a novel experimental protocol for measuring the impact of topic features on author attribution predictive models. We call our technique "novel topic crossvalidation," which consists of holding out a single topic in a test set and iterating over choices of held-out topic to compute an average performance score. Using the New York Times Annotated corpus, we perform a subset procedure to build a sub-corpus of 18,862 documents, 15 authors, and 23 topics. With this sub-corpus, we perform a novel topic crossvalidation. Our experiments differ from previous attempts to model topic/author influence in scope; previous methods were limited to three or fewer topics or authors. Having a larger set of topics and authors should provide researchers with a greater opportunity to explore the variability of style cues represented in sets of authors, as well as the confounding influence of topic. For this reason, we supply document/author/topic identifications so that researchers can build upon our work in a reproducible fashion.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 2009
- Accession Number
- ADA514246
Entities
People
- Johnnie F. Caver
Organizations
- Naval Postgraduate School