A year in Madrid as described through the analysis of geotagged Twitter data
Abstract
Gaining a complete picture of the activity in a city using vast data sources is challenging yet potentially very valuable. One such source of data is Twitter which generates millions of short spatio-temporally localized messages that, as a collection, have information on city regions and many forms of city activity. The quantity of data, however, necessitates summarization in a way that makes consumption by an observer efficient, accurate, and comprehensive. We present a two-step process for analyzing geotagged twitter data within a localized urban environment. The first step involves an efficient form of latent Dirichlet allocation, using an expectation maximization, for topic content summarization of the text information in the tweets. The second step involves spatial and temporal analysis of information within each topic using two complimentary metrics. These proposed metrics characterize the distributional properties of tweets in time and space for all topics. We integrate the second step into a graphical user interface that enables the user to adeptly navigate through the space of hundreds of topics. We present results of a case study of the city of Madrid, Spain, for the year 2011 in which both large-scale protests and elections occurred. Our data analysis methods identify these important events, as well as other classes of more mundane routine activity and their associated locations in Madrid.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Mar 29, 2018
- Source ID
- 10.1177/2399808318764123
Entities
People
- Andrea Bertozzi
- Daniel Balagué
- Hao Li
- Katie Khuu
- Miguel Camacho-collados
- P. Jeffrey Brantingham
- Travis R. Meyer
Organizations
- National Science Foundation Division of Mathematical Sciences
- Office of Naval Research
- University of California
- University of California, Los Angeles