Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs (Proposal)

Abstract

Big Data describes a new era in the digital age in which the volume, velocity, and variety of data is rapidly increasing across a wide range of fields, such as internet search, healthcare, finance, social media, wireless devices, and cybersecurity. These data are growing at a rate well beyond our ability to analyze them. Tools such as spreadsheets, databases, matrices, and graphs have been developed to address these challenges. The common theme amongst these tools is the need to store and operate on data as whole sets instead of as individual data elements. This book describes the common mathematical foundations of these data sets (associative arrays) that apply across many applications and technologies. Associative arrays unify and simplify data, leading to rapid solutions to volume, velocity, and variety problems. Understanding the mathematical underpinnings of data will allow the reader to see past the differences that lie on the surface of these tools and to leverage their mathematical similarities to solve the hardest big data challenges. Specifically, understanding associative arrays reduces the effort required to pass data between steps in a data processing system, allows steps to be interchanged with full confidence that the results will be unchanged, and makes it possible to recognize when steps can be simplified or eliminated.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 13, 2018
Accession Number
AD1084831

Entities

People

  • Hayden Jananthan
  • Jeremy Kepner

Organizations

  • MIT Lincoln Laboratory
  • Vanderbilt University

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Energy and Power Technologies
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Bayesian Networks
  • Big Data
  • Computational Science
  • Computer Languages
  • Computer Networks
  • Computer Programming
  • Computer Programs
  • Computers
  • Data Analysis
  • Data Mining
  • Databases
  • Information Science
  • Information Systems
  • Machine Learning
  • Network Science
  • Neural Networks
  • Social Media

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Database Systems and Applications
  • Economics

Technology Areas

  • Cyber
  • Cyber - Cryptography