Approaches to Generate Keywords

Abstract

Insider threat analysts require efficacious and reliable methods to generate lists of keywords which can be used to support the detection of a given topic of interest. These keywords may be used to create keyword-based detection policies or may be used by analysts to refer to as a reference guide. Direct keyword detection serves as a generalizable approach, as the complexity, and therefore the ability to take advantage of this method in a variety of tools, is much less than context-based detection. This document presents two approaches for generating lists of keywords by describing the intuition and science behind each approach and by discussing the accompanying software code which performs the automatic keyword extraction. Experimentally, we found that both approaches generated lists of keywords that are reasonably indicative of hate or extremism. We recommend considering the incorporation of these approaches into a keyword development process. However, we also note that a manual review of each of the generated lists of keywords be performed prior to the inclusion of the terms into any automated detection capability. We discuss this note in the Results and Implementation section.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2021
Accession Number
AD1137194

Entities

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Data Processing
  • Detection
  • Dictionaries
  • Directories
  • Disasters
  • Engineering
  • Filters
  • Filtration
  • Governments
  • Insider Threats
  • Internet
  • Language
  • Law
  • Machine Learning
  • Social Media
  • Supervised Machine Learning
  • United States

Fields of Study

  • Education
  • Engineering

Readers

  • Computational Linguistics
  • Cybersecurity.
  • Systems Analysis and Design