Predicting the Authenticity of Code-Switched Text Generated by A Large Language Model

Abstract

Japan is a crucial partner in the U.S. Navy's effort to remain the premier naval force in an increasingly contested Indo-Pacific region. However, in the current era of generative technologies, like the large language model (LLM) Chat Generative Pre-trained Transformer (ChatGPT), malevolent actors worldwide now possess an unprecedented capability to generate text-based synthetic media able to sow disarray among allies. Consequently, alliances between the United States and its non-English speaking allies, like Japan, can be tested by text-based deep fakes seeking to reinforce their credibility by using the native languages of both countries; fabricated bilingual diplomatic statements, military communiques, or news articles all possess the potential to upend U.S. global partnerships. Employing the tools of natural language processing (NLP), our research seeks to examine whether we can detect if bilingual text that which may be created to undermine the relationship between the U.S. and Japan is authentic (that is, human-made) or inauthentic (that is,generated by an LLM, namely ChatGPT). We achieved 96% accuracy in our limited trials using logisticregression, with similar results for support vector machine (SVM), k-nearest neighbor (KNN), and naive Bayes models, with each model presenting slightly different misclassifications.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2023
Accession Number
AD1224699

Entities

People

  • Lucas J Horan

Organizations

  • Naval Postgraduate School

Tags

Readers

  • Computational Linguistics
  • East Asian Political and Security Studies within the Soviet Union
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks