Source-Code Stylometry Improvements in Python

Abstract

This technical note covers the work in rewriting existing source-code stylometry software into Python, and describes improvements to performance and maintainability and validation of results. Source-code stylometry is the process of attributing the authorship of source-code samples based on lexical, layout, and syntactic features extracted from code using machine-learning techniques, specifically random forest classifiers. The original work was conducted as part of a collaboration between the US Army Research Laboratory and Drexel University.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 14, 2017
Accession Number: AD1043714

Entities

People

Frederica Nelson
Gregory Shearer

Organizations

United States Army Research Laboratory

Source-Code Stylometry Improvements in Python

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas