Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs (Proposal)
Abstract
Big Data describes a new era in the digital age in which the volume, velocity, and variety of data is rapidly increasing across a wide range of fields, such as internet search, healthcare, finance, social media, wireless devices, and cybersecurity. These data are growing at a rate well beyond our ability to analyze them. Tools such as spreadsheets, databases, matrices, and graphs have been developed to address these challenges. The common theme amongst these tools is the need to store and operate on data as whole sets instead of as individual data elements. This book describes the common mathematical foundations of these data sets (associative arrays) that apply across many applications and technologies. Associative arrays unify and simplify data, leading to rapid solutions to volume, velocity, and variety problems. Understanding the mathematical underpinnings of data will allow the reader to see past the differences that lie on the surface of these tools and to leverage their mathematical similarities to solve the hardest big data challenges. Specifically, understanding associative arrays reduces the effort required to pass data between steps in a data processing system, allows steps to be interchanged with full confidence that the results will be unchanged, and makes it possible to recognize when steps can be simplified or eliminated.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 13, 2018
- Accession Number
- AD1084831
Entities
People
- Hayden Jananthan
- Jeremy Kepner
Organizations
- MIT Lincoln Laboratory
- Vanderbilt University