LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

Abstract

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming capability in one line of code. LLMapReduce supports all programming languages and many schedulers. LLMapReduce can work with any application without the need to modify the application. Furthermore, LLMapReduce can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel programming model. These features allow users to reduce the computational overhead by 10x compared to standard mapreduce. LLMapReduce is widely used by hundreds of users at MIT. Currently LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 23, 2016
Accession Number
AD1033661

Entities

People

  • Albert I. Reuther
  • Andrew J. Prout
  • Antonio Rosa
  • Chansup Byun
  • Charles Yee
  • David Bestor
  • Jeremy Kepner
  • Julie Mullen
  • Matthew Hubbell
  • Peter W. Michaleas
  • Vijay N. Gadepally
  • William Arcand
  • William Bergeron

Organizations

  • MIT Lincoln Laboratory

Tags

DTIC Thesaurus Topics

  • Big Data
  • Computer Programming
  • Data Analysis
  • Data Mining
  • Data Processing
  • Directories
  • Domain Specific Programming Languages
  • Ecosystems
  • Gray Scale
  • Image Processing
  • Language
  • Lisp Programming Language
  • Multiple Input Multiple Output
  • Programming Languages
  • Shell Scripts
  • Standards
  • United States

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Parallel and Distributed Computing.