The Seven C's of Data Curation for the Two C's - Command and Control

Abstract

Many important and complex C2 activities require the use of disparate data sources (structured and unstructured) that are time varying, at various levels of quality (completeness, accuracy, etc.), and of ambiguous origins. Currently, dealing with such disparate data is manually intensive and expensive, in large part because of problems with the quality of the data and its ability to be quickly processed. Data curation can enable automated data discovery, advanced search and retrieval, improvement in the overall data quality, and increased data reuse. The process can be described using what we call the Seven Cs of data curation: (1) CollectInterface to the data sources and accept the inputs; (2) CharacterizeCapture available metadata; (3) CleanIdentify and correct data quality issues; (4) ContextualizeProvide context and provenance; (5) CategorizeFit within framework that defines the problem domain; (6) CorrelateFind relationships among the various data; and, (7) CatalogStore and make data and metadata accessible with application program interfaces (APIs) for search and analysis. The benefits of the data curation process are a reduction in problem-solving time, improved data quality, increased confidence in solutions, reduced time and manual effort to perform the curation itself, and the ability to solve problems that were previously too complex or time-consuming to solve because of data problems.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2015
Accession Number
AD1123699

Entities

People

  • Jonathan R. Agre
  • Karen D. Gordon
  • Marius S. Vassiliou

Organizations

  • Institute for Defense Analyses

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Cyber
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Abstracts
  • Accuracy
  • Application Software
  • Big Data
  • Command And Control
  • Computational Science
  • Computers
  • Data Analysis
  • Data Curation
  • Data Management
  • Data Sets
  • Data Storage Systems
  • Databases
  • Department Of Defense
  • Digital Data
  • Domain Specific Programming Languages
  • Information Systems
  • Metadata
  • Natural Language Processing
  • Security
  • Standards
  • Storage
  • Web Service
  • Xml

Readers

  • Geospatial Intelligence and Artificial Intelligence Analytics
  • Software Engineering.
  • Team-Based Human-Centered Cognitive Task Decision Making and Information Performance.

Technology Areas

  • Fully Networked C3
  • Fully Networked C3 - Command and Control