A Community-Based Code-Switch Discussion Pipeline

Abstract

Social media (SM) facilitates discussions within communities across the globe, and to communicate effectively, multilinguals will often alternate languages in a phenomenon known as code-switching (CW). Discussions in which CW is exhibited can, upon analysis, reveal a community's diversity and provide insight into evolving trends and opinions. Widespread use of SM allows for tracking and characterizing these discussions for cultural and linguistic analysis. Advanced algorithms for community detection, based on network structures of followers and friends, interactions of retweets and mentions, and patterns of hashtag occurrence largely ignore linguistic cues in the body of posts. For this reason, the applicability of these state-of-the-art approaches to problems involving CW analysis has been limited, as the resulting communities are dependent on attribute types used in the detection rather than on attributes characterizing the significance of the CW; that is, the connections among posters, the topics under discussion, and the social context in which it occurs. Here we develop a new framework to facilitate understanding and CW processing of high volumes of SM information by 1) detecting community-based multilingual SM discussions, 2) defining evaluation metrics and heuristics to obtain CW discussions, 3) developing word-level language ID algorithms, 4) visualizing user-discussion graphs where component types are extracted based on defined rankings, and 5) representing discussions as trees with first-order nodes as posts, and nonterminal and leaf nodes as responses.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Mar 01, 2021
Accession Number: AD1126224

Entities

People

Aaron Harwood
Lucia Falzon
Michelle Vanni
Prarthana Padia
Shanika Karunasekera
Sue Kase

Organizations

United States Army Research Laboratory

A Community-Based Code-Switch Discussion Pipeline

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Readers