Improving Text Classification with Semantic Information

Abstract

The Air Force contracts a variety of positions, from Information Technology to maintenance services. There is currently no automated way to verify that quotes for services are reasonably priced. Small training data sets and word sense ambiguity are challenges that such a tool would encounter, and additional semantic information could help. This thesis hypothesizes that leveraging a semantic network could improve text-based classification. This thesis uses information from ConceptNet to augment a Naive Bayes Classifier. The leveraged semantic information would add relevant words from the category domain to the model that did not appear in the training data. The experiment compares variations of a Naive Bayes Classifier leveraging semantic information, including an Ensemble Model, against classifiers that do not. Results show a significant performance increase in a smaller data set but not a larger one. Out of all models tested, an Ensemble Based Classifier performs the best on both data sets. The results show that ConceptNet does not add enough new or relevant information to affect classifier performance on large data sets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2021
Accession Number
AD1135189

Entities

People

  • Joshua H. White

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Artificial Intelligence
  • Computer Science
  • Computers
  • Contracts
  • Data Mining
  • Data Science
  • Data Sets
  • Information Science
  • Information Systems
  • Law
  • Machine Learning
  • Network Science
  • Neural Networks
  • Supervised Machine Learning
  • Training
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Regression Analysis.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval