Cluster Computing For Automated Network Analysis At Scale

Abstract

Conventional single node packet analyzers are unable to monitor network traffic at scale. In this thesis, elements of the Apache Hadoop ecosystem, including HBase, Spark, and MapReduce, are employed to conduct network traffic analysis on a large collection of network traffic. Limited analysis is conducted directly on packet capture next generation (pcapng) files on the Hadoop Distributed File System (HDFS) using MapReduce. Next, to allow for repeated analysis on the same dataset without reading all source files in their entirety for every calculation, pcapng files are parsed and relevant meta-data is bulk loaded into HBase, a Not Only Structured Query Language (NoSQL) database employing the HDFS for parallelization. This NoSQL database is then accessed via Apache Spark where pertinent data is loaded into DataFrames and additional analysis on the network traffic takes place. This research demonstrates the viability of custom, modular, automated analytics, employing open-source software to enable parallelization, to conduct traffic analysis at scale.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 01, 2018
Accession Number: AD1059771

Entities

People

Benjamin J. Brida

Organizations

Naval Postgraduate School

Cluster Computing For Automated Network Analysis At Scale

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers