Evaluating Semantic Matching Techniques for Technical Documents

Abstract

Machine learning models that employ NLP techniques have become more widely accessible, making them an attractive solution for text and document classification tasks traditionally accomplished by humans. Two such use cases are matching the specialized experience required for a job to statements in applicant resumes, and xC;finding and labelling clauses in legal contracts The AFMC has an immediate need for solutions to civilian hiring. However, there is currently no truth data to validate against. A similar task is contract understanding for which there is the CUAD, a recently published repository of 510 contracts manually labelled by legal experts. The presented semantic matching approach first extracts, preprocesses and embeds contract clauses into a 512-dimesnion TF-IDF feature vector. Four logistic models are trained on a subset of these vectors. Then, the models are tuned to accept the contracts as text documents split into sliding windows of words. Next, the model performances are measured on a previously isolated test set and compared against the transformer models employed in the original CUAD research.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 24, 2022
Accession Number
AD1166894

Entities

People

  • Rain F. Dartt

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Autonomy
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computational Science
  • Computer Languages
  • Computer Programs
  • Dimensionality Reduction
  • Electrical Engineering
  • Engineering
  • Governments
  • Information Processing
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Natural Language Processing
  • Natural Languages
  • Network Science
  • Neural Networks
  • Supervised Machine Learning
  • Test Sets
  • United States Government

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Defense Acquisition Program Management

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval