Big Data and Big Data Analytics

Abstract

Big data and big data analytics are among the fastest growing and most strategically important technologies for businesses and organizations alike in both public and private sector. They are also among the most active and rapidly evolving areas of research, development, and application deployment. The primary objectives for this research project are: 1) Understand the fundamentals of big data and big data analytics; 2) Gain hands-on knowledge and skills in working with these technologies; and 3) Apply these techniques for performing representative security analysis tasks. This project uses the open source Apache Hadoop platform and related tools, which are the most widely used building blocks for big data and big data analytics. Two different Hadoop distributions, Cloudera QuickStart and Hortonworks Sandbox, have been used. They both run as VirtualBox virtual machines in a Mac OS X notebook. The security analysis tasks practiced for this research cover the major areas of the workflows: 1) Identify the data sources; 2) Extract, transform, and load the data into Hive which provides a SQL-like language called HiveQL for querying and managing large datasets residing in Hadoops distributed file system; and 3) Use HiveQL to query thedata and attempt to find answers to the research questions. Two types of data sources have been used for this project. The first type represents static text files that are downloaded from websites. The second type represents dynamic streaming data, such as log files, that are collected using Apache Flume.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 16, 2014
Accession Number
AD1091009

Entities

People

  • Kaila Perry

Organizations

  • Norfolk State University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Big Data
  • Computer Programming
  • Computer Science
  • Computers
  • Data Analysis
  • Data Sets
  • Data Storage Systems
  • Domain Specific Programming Languages
  • Language
  • Operating Systems
  • Security
  • Social Media
  • Social Networking Services
  • Social Networks
  • Software Development
  • User Friendly
  • Virtual Machines

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Distributed Systems and Data Platform Development
  • Economics