Fine-Tuning A Multilingual Language Model to Prune Automated Event Data
Abstract
Every day, an enormous volume of written and transcribed media is produced, making it impossible for intelligence analysts to sift through it all without a large human workforce. However, multilingual language models can help intelligence analysts select media articles relevant to their problem set, even if they are written in a foreign or low resource language, by parsing out non-relevant articles. The Global Database of Events Language and Tone (GDELT) is a near real-time media database that releases new collections of open-source articles every 15 minutes, but its automated event coding often leads to a high number of false positive samples.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 2023
- Accession Number
- AD1213533
Entities
People
- Seth W. Kyler
Organizations
- Naval Postgraduate School