Predicting the Authenticity of Code-Switched Text Generated by A Large Language Model
Abstract
Japan is a crucial partner in the U.S. Navy's effort to remain the premier naval force in an increasingly contested Indo-Pacific region. However, in the current era of generative technologies, like the large language model (LLM) Chat Generative Pre-trained Transformer (ChatGPT), malevolent actors worldwide now possess an unprecedented capability to generate text-based synthetic media able to sow disarray among allies. Consequently, alliances between the United States and its non-English speaking allies, like Japan, can be tested by text-based deep fakes seeking to reinforce their credibility by using the native languages of both countries; fabricated bilingual diplomatic statements, military communiques, or news articles all possess the potential to upend U.S. global partnerships. Employing the tools of natural language processing (NLP), our research seeks to examine whether we can detect if bilingual text that which may be created to undermine the relationship between the U.S. and Japan is authentic (that is, human-made) or inauthentic (that is,generated by an LLM, namely ChatGPT). We achieved 96% accuracy in our limited trials using logisticregression, with similar results for support vector machine (SVM), k-nearest neighbor (KNN), and naive Bayes models, with each model presenting slightly different misclassifications.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2023
- Accession Number
- AD1224699
Entities
People
- Lucas J Horan
Organizations
- Naval Postgraduate School