Source-Code Stylometry Improvements in Python

Abstract

This technical note covers the work in rewriting existing source-code stylometry software into Python, and describes improvements to performance and maintainability and validation of results. Source-code stylometry is the process of attributing the authorship of source-code samples based on lexical, layout, and syntactic features extracted from code using machine-learning techniques, specifically random forest classifiers. The original work was conducted as part of a collaboration between the US Army Research Laboratory and Drexel University.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 14, 2017
Accession Number
AD1043714

Entities

People

  • Frederica Nelson
  • Gregory Shearer

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Engineered Resilient Systems
  • Human Systems

DTIC Thesaurus Topics

  • Abstracts
  • Application Software
  • Classification
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Data Processing
  • Information Science
  • Learning
  • Machine Learning
  • Military Research
  • Programming Languages
  • Python Programming Language
  • Shell Scripts
  • Training
  • Validation

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Criminal Law
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation