Using Custom NER Models to Extract DOD Specific Entities from Contracts

Abstract

The Air Force Sustainment Center collected 3.7 million contracts onto the Air Force Research Laboratory's high power computers. They are in the format of a .pdf or scanned document, making them unstructured data. The Data Analytics Resource Team extracted the documents into a textual format for use in further analysis. This thesis looks to extract four DOD specific entities (NSN, Part Number, CAGE Code, and Supplier Name) from the contracts using custom NER models. This newly extracted information will allow the Air Force to identify what parts are supplied by which vendors. This information along with historical CLIN pricing for the vendor specific part number can give decision makers the ability to negotiate pricing based on historical data and competitor pricing. In addition to just pricing, part numbers can be aligned with maintenance data to make informed decisions on which vendor to go with by analyzing life cycle costs of a part.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 23, 2021
Accession Number
AD1157022

Entities

People

  • Kayla P Haberstich

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computers
  • Data Analysis
  • Data Mining
  • Data Science
  • Information Science
  • Language
  • Machine Learning
  • Named Entity Recognition
  • Natural Language Processing
  • Network Science
  • Supervised Machine Learning

Readers

  • Distributed Systems and Data Platform Development
  • Government Contracting/Procurement.
  • Logistics and Supply Chain Management.