Developing a Common Interchange Model and Format for Representing Knowledge Synthesized from HLT Analytic Results

Abstract

In the Human Language Technology (HLT) domain, analytic results extracted from raw document sources are captured in varied models and formats due to the depth of what can be revealed and the diversity of interpretation. However, some common model and format must be followed to allow for multiple analytics to operate together in workflows and enable both the communication between analytics and the fusion of parallel or complementary results. This data integration problem is exacerbated when placing an emphasis on extracting knowledge from text, as the data model must be both adaptable and extensible to handle current and emerging content extraction capabilities and technologies. This paper describes a common interchange format and model designed to coordinate the extracted information from raw document sources in order to generate knowledge. The approach described adheres to the principles of adaptability and extensibility. It also provides the means to represent the annotation data that act as the reference for the knowledge and maintain provenance about these analytic results. While the data model and format described were designed for the HLT domain, the process used to develop them can be applied to other domains as well (e.g., image processing, signal processing).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2015
Accession Number
AD1107782

Entities

People

  • Joseph Jubinski
  • Michael J. Smith
  • Ransom Winder

Organizations

  • MITRE Corporation

Tags

Communities of Interest

  • Cyber
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Ambiguity
  • Artifacts
  • Extraction
  • Hierarchies
  • Image Processing
  • Language
  • Metadata
  • Models
  • Natural Language Processing
  • Natural Languages
  • Ontologies
  • Personality
  • Philosophy
  • Signal Processing
  • Standards
  • Taxonomy
  • Urban Areas

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Database Systems and Applications
  • Distributed Systems and Data Platform Development