Developing Natural Language Processing Algorithms to Medically Code the Clinical Notes in the Theater Data Medical Store

Abstract

Outside the Department of Defense, natural language processing (NLP) strategies have been used with electronic health records (EHR) to increase information extraction from free text notes and structured fields, allowing access to much larger cohorts than previously possible. Current operational medical data is held in the Theater Medical Data Store (TMDS). Most of the medical information in TMDS is contained in unstructured text fields. The objective will be to automate the data-coding process into the injury diagnostic code groups, which are derived from the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes. There are over 8million records in the TMDS and there may be as much as 50% of the ICD-9-CM codes that are not completely or accurately coded. The accuracy of the data in the TMDS has never been quantified, largely because most has been captured without any medical billing concerns. The study has developed a set of programming rules using NLP and machine learning (ML) (i.e., algorithms generated by automated learning from manually coded data), with eventual output that will represent human interpretation as much as possible. The coding algorithm models have been developed using pre-existing coded medical records from the Expeditionary Medical Encounter Dataset (EMED) housed at the Naval Health Research Center (NHRC). Experienced nurse staff are responsible for coding and validating all the EMED medical encounter records. The model will be trained ona subset of the EMED data and then tested on TMDS data that has been matched to the remaining EMED data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2021
Accession Number
AD1162371

Entities

People

  • Andrew Olson
  • Edwin D'souza
  • James Zouris
  • Trevor Elkins

Organizations

  • Naval Health Research Center

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Biomedical Research
  • Classification
  • Computer Programming
  • Contractors
  • Deep Learning
  • Department Of Defense
  • Electronic Mail
  • Health Services
  • Internet
  • Language
  • Learning
  • Machine Learning
  • Maryland
  • Medical Personnel
  • Natural Language Processing
  • Natural Languages
  • Patent Applications
  • Professional Development
  • Project Management

Readers

  • Database Systems and Applications
  • Medical or Health Care Field.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks
  • Microelectronics