Intelligence Analyst Associate (IAA) - CYC Knowledge Extraction

Abstract

The objective of this project was to assess the feasibility of leveraging the capabilities and strengths of the Intelligence Analyst Associate (IAA) and the Cyc Knowledge Base (KB) in order to help alleviate the textual data overload that intelligence analysts experience. IAA has capabilities for processing large volumes of unstructured text, extracting information relevant to intelligence analysts, such as entities (people, organizations, locations, dates, and times) and simple events (subject, verbs, and objects), storing the extracted information in a structured database, and enabling the use of visualization and analysis tools. The Cyc KB is a formalized representation of a vast quantity of fundamental human knowledge (facts, rules, of thumb, and heuristics) and consists of terms and assertions which relate those terms. By leveraging the Cyc KB, significant capabilities were exploited that greatly benefited the IAA and its end users. These included the ability to represent domain dependent facts in the Cyc KB to identify, classify, and specify knowledge concerning relevant entities as well as the ability to represent rules in the KB and use the Cyc KB inference engine to allow information to be derived from identified entities and entity classifications. A 'plug-in, plug-out' system framework was developed that served as the processing framework and testbed for the information extraction components, similar to the framework used for the IAA system. One of the main goals was aimed at reducing the time it takes to run a document through the IAA-Cyc system. The prototype system developed under a previous effort processed documents at a rate of 1.5 minutes per sentence. The new system processes documents at a rate of four seconds per document in which a document is typically comprised of 30-50 sentences.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA412179

Entities

People

  • Benjamin Rode
  • Chris Crowner
  • Dave Gunning
  • Jeannette Neal

Tags

Communities of Interest

  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Change Detection
  • Classification
  • Databases
  • Graphical User Interface
  • Inference Engines
  • Information Systems
  • Intelligence Analysts
  • Language
  • Lessons Learned
  • Military Organizations
  • Natural Language Processing
  • Natural Languages
  • Reasoning
  • Software Development
  • Time Intervals

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development
  • Molecular Genetics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval