The Seven C's of Data Curation for the Two C's - Command and Control

Abstract

Many important and complex C2 activities require the use of disparate data sources (structured and unstructured) that are time varying, at various levels of quality (completeness, accuracy, etc.), and of ambiguous origins. Currently, dealing with such disparate data is manually intensive and expensive, in large part because of problems with the quality of the data and its ability to be quickly processed. Data curation can enable automated data discovery, advanced search and retrieval, improvement in the overall data quality, and increased data reuse. The process can be described using what we call the Seven Cs of data curation: (1) CollectInterface to the data sources and accept the inputs; (2) CharacterizeCapture available metadata; (3) CleanIdentify and correct data quality issues; (4) ContextualizeProvide context and provenance; (5) CategorizeFit within framework that defines the problem domain; (6) CorrelateFind relationships among the various data; and, (7) CatalogStore and make data and metadata accessible with application program interfaces (APIs) for search and analysis. The benefits of the data curation process are a reduction in problem-solving time, improved data quality, increased confidence in solutions, reduced time and manual effort to perform the curation itself, and the ability to solve problems that were previously too complex or time-consuming to solve because of data problems.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Feb 01, 2015
Accession Number: AD1123699

Entities

People

Jonathan R. Agre
Karen D. Gordon
Marius S. Vassiliou

Organizations

Institute for Defense Analyses

The Seven C's of Data Curation for the Two C's - Command and Control

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas