Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports
Abstract
Longitudinal data on key cancer outcomes for clinical research, such as response to treatment and disease progression, are not captured in standard cancer registry reporting. Manual extraction of such outcomes from unstructured electronic health records is a slow, resource-intensive process. Natural language processing (NLP) methods can accelerate outcome annotation, but they require substantial labeled data. Transfer learning based on language modeling, particularly using the Transformer architecture, has achieved improvements in NLP performance. However, there has been no systematic evaluation of NLP model training strategies on the extraction of cancer outcomes from unstructured text.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Sep 02, 2023
- Source ID
- 10.1186/s12859-023-05439-1
Entities
People
- Deborah Schrag
- Eliezer M. Van Allen
- Haitham A. Elmarakeby
- Irbaz Bin Riaz
- Kenneth L. Kehl
- Pavel S. Trukhanov
- Vidal M. Arroyo
Organizations
- Doris Duke Charitable Foundation
- National Cancer Institute
- Prostate Cancer Foundation
- United States Department of Defense