Big Data and Big Data Analytics
Abstract
Big data and big data analytics are among the fastest growing and most strategically important technologies for businesses and organizations alike in both public and private sector. They are also among the most active and rapidly evolving areas of research, development, and application deployment. The primary objectives for this research project are: 1) Understand the fundamentals of big data and big data analytics; 2) Gain hands-on knowledge and skills in working with these technologies; and 3) Apply these techniques for performing representative security analysis tasks. This project uses the open source Apache Hadoop platform and related tools, which are the most widely used building blocks for big data and big data analytics. Two different Hadoop distributions, Cloudera QuickStart and Hortonworks Sandbox, have been used. They both run as VirtualBox virtual machines in a Mac OS X notebook. The security analysis tasks practiced for this research cover the major areas of the workflows: 1) Identify the data sources; 2) Extract, transform, and load the data into Hive which provides a SQL-like language called HiveQL for querying and managing large datasets residing in Hadoops distributed file system; and 3) Use HiveQL to query thedata and attempt to find answers to the research questions. Two types of data sources have been used for this project. The first type represents static text files that are downloaded from websites. The second type represents dynamic streaming data, such as log files, that are collected using Apache Flume.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 16, 2014
- Accession Number
- AD1091009
Entities
People
- Kaila Perry
Organizations
- Norfolk State University