Analyzing Feature Relevance for Social Media Traffic Classification with Machine Learning

Abstract

Prior research completed the in classification of Social Media traffic was able to successfully demonstrate the feasibility of classifying Social Media (SM) network traffic using traditional Machine Learning (ML) techniques, on a packet-by-packet basis. This paper builds on these results and evaluates feature analysis methods which were explored during the ML experiments. Previously, an exhaustive search to evaluate nearly all possible combinations of input features to find the best subset for the implementation of Support Vector Machine (SVM) models was utilized. Exhaustive feature searches tend to be computational expensive and perhaps even prohibitive for large feature sets. In one case, up to sixteen features were explored, which required 65,535 combinatorial executions. While the project had the benefit of having access to High Performance Computing (HPC) resources, such types of computing resources may not always be available to every project. The potential application of enhanced feature analysis and feature selection techniques could result in an optimum subset of features in the development of ML models, hence reducing computation time and avoiding overfitting1. This ML project provides a unique opportunity to evaluate such feature reduction techniques and compare the results to its exhaustive search process. For classification problems, the underlying goal of a machine learning model is to find information within the input data to produce the best prediction. The selected approach for investigating the results from the original SM ML project will be to analyze the input data features as to identify unique characteristics that may have contributed to the ML model success in Section A and B under the investigation results. Section C and D will then explore automated feature selection techniques on the larger feature set.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 13, 2020
Accession Number
AD1110423

Entities

People

  • Bela Erdelyi
  • Johnson John
  • Metin Ahiskali

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Computer Networks
  • Data Analysis
  • Data Processing
  • Feature Extraction
  • Feature Selection
  • High Performance Computing
  • Identification
  • Information Science
  • Machine Learning
  • Network Protocols
  • Networks
  • Pattern Recognition
  • Predictive Modeling
  • Social Media
  • Statistics
  • Supervised Machine Learning
  • Transport Protocols

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks