Cross-Lingual Question Answering to Identifying Information Differences between English and Russian Wikipedia Articles

Abstract

In this project, we developed technologies for uncovering and the root cause of information discrepancies across cultures and languages. This followed the assumption that different historical accounts of major events and entities can lead to friction and misunderstanding between cultures. The main inquiry was into developing a system that reads Russian and English articles from Wikipedia on the same topic, and automatically identifies cultural differences, particularly persuasive language, between them. Importantly, we utilized the power of large language models (LLMs), such as GPT-4, combined with our decomposition of the larger problem of persuasion detection into subtasks, each of which uses a carefully designed prompt. We quantify the amount of persuasion in each article, which allows for various analyses and experiments. Notably, we generate two rankings, one per language, of articles by amount of persuasive content. These rankings match our intuitions on which subjects are meaningful to which cultures. In our development process, we identified limitations in prior datasets for multilingual persuasion detection. Using insights from our system development, we further released a large-scale, broadly scoped synthetic dataset.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 01, 2024
Accession Number: AD1230759

Entities

People

Chris Callison-burch
Marianna Apidianaki

Organizations

University of Pennsylvania

Cross-Lingual Question Answering to Identifying Information Differences between English and Russian Wikipedia Articles

Abstract

Document Details

Entities

People

Organizations

Tags

Readers