Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text

Abstract

Automating the process of assigning security classifications to unstructured text would facilitate a transition to a data-centric architecture-one that promotes information sharing, in which all data in an organization are electronically labelled. In this document, we report the results of a series of experiments conducted to investigate the effectiveness of using statistical natural language processing and machine learning techniques to automatically assign security classifications to documents. We present guidelines for selecting parameters to maximize the accuracy of a machine learning algorithm's classification decisions for several well-defined collections of documents. We examine the significance of a document's topic and the effect of security policy changes on the ability of our system to automate classification; we include design recommendations to address both topic and policy considerations. Our classification techniques prove effective at assessing a document's sensitivity, achieving accuracies upwards of 80%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2010
Accession Number
ADA551452

Entities

People

  • Daniel Charlebois
  • J. D. Brown

Organizations

  • Defence Research and Development Canada

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Human Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Computational Complexity
  • Control Systems
  • Cross Domain
  • Data Mining
  • Dimensionality Reduction
  • Electronic Mail
  • Information Exchange
  • Information Science
  • Information Transfer
  • Machine Learning
  • National Security
  • Natural Language Processing
  • Natural Languages
  • Reliability
  • Security
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks
  • Microelectronics