Learning User Latent Attributes on Social Media

Abstract

In recent years, there is a growing interest in using social media to understand social phenomena. Researchers have demonstrated many important applications of using online social media to understand real world events, such as presidential election prediction, earthquake early detection, and disaster management. A social media site is mixed with different types of users, in terms of gender, location, ideology, and etc. Different types of users may have different motivations, different opinions towards certain topics, different resources at their disposal, different behaviors in events. If researchers want to understand what is happening on a social media site, it is important to know where a post comes from, who wrote this post, and which party the author belongs to. However, this information is not explicitly provided by users. In this thesis, the goal is to predict users' latent attributes such as their locations, social identities, and political orientations. Thanks to the massive text data on social media, we can learn rich knowledge from text to predict users attributes. In the meanwhile, text data from social media often comes with a significant amount of metadata. Furthermore, data from social networks also contains rich connection information, eg. mentioning, following. It is still a challenge task to combine text, meta data, user network together for user attributes prediction. In this thesis, I approach user attributes prediction at three levels - single post, user timeline, graph-level classification. I start with a global location prediction system that uses one single tweet as input to learn one user's location. It utilizes location-related features in a tweet, such as text and user profile metadata. I extend the tweet-level prediction system to user-level, which combines multiple posts in one user's timeline. I demonstrate the effectiveness of this model on the task of user social identity classification.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2020
Accession Number
AD1157340

Entities

People

  • Binxuan Huang

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Energy and Power Technologies
  • Weapons Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Automata Theory
  • Computational Science
  • Computer Languages
  • Covid-19
  • Data Mining
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Natural Language Processing
  • Network Science
  • Neural Networks
  • Ontologies
  • Social Media
  • Social Networking Services
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Database Systems and Applications
  • Neural Network Machine Learning.