The Seven C's of Data Curation for the Two C's - Command and Control
Abstract
Many important and complex C2 activities require the use of disparate data sources (structured and unstructured) that are time varying, at various levels of quality (completeness, accuracy, etc.), and of ambiguous origins. Currently, dealing with such disparate data is manually intensive and expensive, in large part because of problems with the quality of the data and its ability to be quickly processed. Data curation can enable automated data discovery, advanced search and retrieval, improvement in the overall data quality, and increased data reuse. The process can be described using what we call the Seven Cs of data curation: (1) CollectInterface to the data sources and accept the inputs; (2) CharacterizeCapture available metadata; (3) CleanIdentify and correct data quality issues; (4) ContextualizeProvide context and provenance; (5) CategorizeFit within framework that defines the problem domain; (6) CorrelateFind relationships among the various data; and, (7) CatalogStore and make data and metadata accessible with application program interfaces (APIs) for search and analysis. The benefits of the data curation process are a reduction in problem-solving time, improved data quality, increased confidence in solutions, reduced time and manual effort to perform the curation itself, and the ability to solve problems that were previously too complex or time-consuming to solve because of data problems.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 01, 2015
- Accession Number
- AD1123699
Entities
People
- Jonathan R. Agre
- Karen D. Gordon
- Marius S. Vassiliou
Organizations
- Institute for Defense Analyses