Data Lake Ecosystem Workflow

Abstract

The Engineer Research and Development Center, Information Technology Laboratory's (ERDC-ITLs) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2021
Accession Number
AD1126785

Entities

People

  • Alicia I. Ruvinsky
  • Cody A. Coleman
  • Lakenya K. Walker
  • Maria Seale
  • Quyen T. Dong
  • R. C. Salter
  • W. G. Bond

Organizations

  • Engineer Research and Development Center

Tags

Communities of Interest

  • Biomedical
  • Engineered Resilient Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Army
  • Artificial Intelligence
  • Big Data
  • Computational Science
  • Data Analysis
  • Data Curation
  • Data Lakes
  • Data Management
  • Data Science
  • Data Storage Systems
  • Data Transmission
  • Database Management Systems
  • Deep Learning
  • Department Of Defense
  • Ecology
  • Ecosystems
  • Engineering
  • Engineers
  • Governments
  • High Performance Computing
  • Information Systems
  • Machine Learning
  • Reliability
  • Situational Awareness

Fields of Study

  • Computer science

Readers

  • Defense Technology Research and Development.
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy