Predicting the Authenticity of Code-Switched Text Generated by A Large Language Model

Abstract

Japan is a crucial partner in the U.S. Navy's effort to remain the premier naval force in an increasingly contested Indo-Pacific region. However, in the current era of generative technologies, like the large language model (LLM) Chat Generative Pre-trained Transformer (ChatGPT), malevolent actors worldwide now possess an unprecedented capability to generate text-based synthetic media able to sow disarray among allies. Consequently, alliances between the United States and its non-English speaking allies, like Japan, can be tested by text-based deep fakes seeking to reinforce their credibility by using the native languages of both countries; fabricated bilingual diplomatic statements, military communiques, or news articles all possess the potential to upend U.S. global partnerships. Employing the tools of natural language processing (NLP), our research seeks to examine whether we can detect if bilingual text that which may be created to undermine the relationship between the U.S. and Japan is authentic (that is, human-made) or inauthentic (that is,generated by an LLM, namely ChatGPT). We achieved 96% accuracy in our limited trials using logisticregression, with similar results for support vector machine (SVM), k-nearest neighbor (KNN), and naive Bayes models, with each model presenting slightly different misclassifications.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 2023
Accession Number: AD1224699

Entities

People

Lucas J Horan

Organizations

Naval Postgraduate School

Predicting the Authenticity of Code-Switched Text Generated by A Large Language Model

Abstract

Document Details

Entities

People

Organizations

Tags

Readers

Technology Areas