Log Analysis Using Splunk Hadoop Connect

Abstract

The purpose of this research it to use Splunk and Hadoop to do timestamp analysis on computer logs. Splunk is a commercial data analytics tool. Hadoop is a system for large-scale distributed storage and processing. This research ingested computer logs from two kinds of forensic data from the Real Data Corpus to establish a baseline and find anomalies. We analyzed timestamps and Event IDs on more than two thousand logs across hundreds of drives. Additionally, we used packet captures from Center for Applied Internet Data Analysis to test Hadoops ability to store and transfer data between Hadoop Distributed File System and Splunk. We used Splunk Hadoop Connect for data transfer between a Splunk server and a Hadoop cluster. Splunk was able to effectively identify and represent statistical anomalies in log files. These anomalies could reveal misconfiguration, security concerns, or unusual but harmless traffic. Splunk could also easily transfer data to relatively inexpensive commodity servers using Splunk Hadoop Connect.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2017
Accession Number
AD1046314

Entities

People

  • Boulat Chainourov

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Cyber
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Commerce
  • Computer Network Security
  • Computer Networks
  • Computer Programs
  • Computer Science
  • Computers
  • Cybersecurity
  • Data Analysis
  • Data Transmission
  • Information Systems
  • Internet
  • Intrusion Detection Systems
  • Intrusion Detectors
  • Network Computing
  • Network Protocols
  • Operating Systems
  • Web Browsers

Readers

  • Computer Networking
  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Mathematical Modeling and Probability Theory.