Working with and Visualizing Big Data Efficiently with Python for the DARPA XDATA Program

Abstract

Research performed under the XDATA program focused on computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g. tabular, relational, categorical, meta-data) and unstructured (e.g. text, documents, message traffic). Several open source project which have seen community and industry adoption grew out of this effort. - Blaze: A collection packages for describing and accessing, and manipulating disparate data sources and types - Numba: A just-in-time function compiler for Python, based on LLVM compiler project allowing researchers to run their Python code near native speeds on CPUs and GPUs. - Dask: Parallelizes generic Python and extends NumPy, Pandas, and Scikit-learn with parallel variants. -Bokeh: Create interactive web applications from Python without having to know Javascript, CSS, or HTML.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2017
Accession Number
AD1038470

Entities

People

  • Bryan Van De Ven
  • Hunt Sparra
  • Matthew Rocklin
  • Peter Wang
  • Stan Seibert
  • Travis Oliphant

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Energy and Power Technologies
  • Space

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Big Data
  • Compilers
  • Computer Program Documentation
  • Computer Programming
  • Computer Programs
  • Computers
  • Data Analysis
  • Data Science
  • Data Sets
  • Data Storage Systems
  • Information Systems
  • Language
  • Lessons Learned
  • Web Applications
  • Web Browsers

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Distributed Systems and Data Platform Development
  • Parallel and Distributed Computing.