A Machine Learning Approach for Classifying Java Script Using Static Code Analysis

Abstract

This thesis develops a machine learning approach to classify normal and anomalous JavaScript based on a static analysis of select features derived from the top 30 000 webpages on the internet. A dataset of 136features was extracted from 100 000 raw JavaScript files. Nine test groups were created and tested using 10 subsets of features. K-means clustering was used to group the data and manually translate into binary classification. The results from the K-means clustering show moderate performance with distortions less than 1.0 from elbow plot analysis and average silhouette scores between 0.3 and 0.8 using silhouette analysis of the clustering. The classification of each JavaScript file was then examined using nave Bayes algorithm to re-create and examine the performance of the highest performing classifiers using a less processing intensive method. Nave Bayes was not a good model to re-create the K-means classifier. The best performing classifiers had a Matthews correlation coefficient of 0.75 when examining small JavaScript, and less that 0.38 when examining the medium or large JavaScript. The results show that most JavaScript files were small in file size, and file size was the only defining feature. No features tested effectively categorize the vast majority of JavaScript other than file size. Further research is needed to find features that more accurately encompass the majority of JavaScript to define normal JavaScript.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2022
Accession Number
AD1173448

Entities

People

  • Michael D. Miller

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Cyber

DTIC Thesaurus Topics

  • Algorithms
  • Computer Languages
  • Computer Programming
  • Computers
  • Data Mining
  • Data Science
  • Data Sets
  • Detection
  • Dimensionality Reduction
  • Dynamic Programming Languages
  • Electrical Engineering
  • Information Science
  • Information Theory
  • Language
  • Machine Learning
  • Mobile Application Software
  • Neural Networks
  • Programming Languages
  • Python Programming Language
  • Statistics

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks