Source-Code Stylometry Improvements in Python
Abstract
This technical note covers the work in rewriting existing source-code stylometry software into Python, and describes improvements to performance and maintainability and validation of results. Source-code stylometry is the process of attributing the authorship of source-code samples based on lexical, layout, and syntactic features extracted from code using machine-learning techniques, specifically random forest classifiers. The original work was conducted as part of a collaboration between the US Army Research Laboratory and Drexel University.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 14, 2017
- Accession Number
- AD1043714
Entities
People
- Frederica Nelson
- Gregory Shearer
Organizations
- United States Army Research Laboratory