A Community-Based Code-Switch Discussion Pipeline

Abstract

Social media (SM) facilitates discussions within communities across the globe, and to communicate effectively, multilinguals will often alternate languages in a phenomenon known as code-switching (CW). Discussions in which CW is exhibited can, upon analysis, reveal a community's diversity and provide insight into evolving trends and opinions. Widespread use of SM allows for tracking and characterizing these discussions for cultural and linguistic analysis. Advanced algorithms for community detection, based on network structures of followers and friends, interactions of retweets and mentions, and patterns of hashtag occurrence largely ignore linguistic cues in the body of posts. For this reason, the applicability of these state-of-the-art approaches to problems involving CW analysis has been limited, as the resulting communities are dependent on attribute types used in the detection rather than on attributes characterizing the significance of the CW; that is, the connections among posters, the topics under discussion, and the social context in which it occurs. Here we develop a new framework to facilitate understanding and CW processing of high volumes of SM information by 1) detecting community-based multilingual SM discussions, 2) defining evaluation metrics and heuristics to obtain CW discussions, 3) developing word-level language ID algorithms, 4) visualizing user-discussion graphs where component types are extracted based on defined rankings, and 5) representing discussions as trees with first-order nodes as posts, and nonterminal and leaf nodes as responses.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2021
Accession Number
AD1126224

Entities

People

  • Aaron Harwood
  • Lucia Falzon
  • Michelle Vanni
  • Prarthana Padia
  • Shanika Karunasekera
  • Sue Kase

Organizations

  • United States Army Research Laboratory

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence Software
  • Communities
  • Computer Languages
  • Data Mining
  • Department Of Defense
  • Detection
  • Information Operations
  • Information Science
  • Language
  • Machine Learning
  • Media
  • Military Research
  • Named Entity Recognition
  • Natural Language Processing
  • Natural Languages
  • Network Architecture
  • Neural Networks
  • Online Communications
  • Operating Systems
  • Social Media
  • Social Networking Services
  • Social Networks
  • Standards
  • Switches
  • Switching

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.
  • Theoretical Analysis.