Cross-Lingual Question Answering to Identifying Information Differences between English and Russian Wikipedia Articles
Abstract
In this project, we developed technologies for uncovering and the root cause of information discrepancies across cultures and languages. This followed the assumption that different historical accounts of major events and entities can lead to friction and misunderstanding between cultures. The main inquiry was into developing a system that reads Russian and English articles from Wikipedia on the same topic, and automatically identifies cultural differences, particularly persuasive language, between them. Importantly, we utilized the power of large language models (LLMs), such as GPT-4, combined with our decomposition of the larger problem of persuasion detection into subtasks, each of which uses a carefully designed prompt. We quantify the amount of persuasion in each article, which allows for various analyses and experiments. Notably, we generate two rankings, one per language, of articles by amount of persuasive content. These rankings match our intuitions on which subjects are meaningful to which cultures. In our development process, we identified limitations in prior datasets for multilingual persuasion detection. Using insights from our system development, we further released a large-scale, broadly scoped synthetic dataset.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 2024
- Accession Number
- AD1230759
Entities
People
- Chris Callison-burch
- Marianna Apidianaki
Organizations
- University of Pennsylvania