Analyzing Migration Patterns from Central America Using Natural Language Processing and Machine Learning
Abstract
What explains variations in migration flows from Central America to the U.S.? This question is central to U.S. domestic and national security policy at present. Given that climate change projections point towards an additional 17 million climate-induced migrants in Latin America in the next three decades (Rigaud, de Sherbinin, et al., 2018), few questions are more deserving of attention and empirical investigation. This study relies on quantitative and computer science methods to conduct basic research in order to understand the determinants of migration patterns from Honduras, El Salvador, and Guatemala to the U.S. The empirical assessment will comprise all municipalities from these three countries between 2000-2019. The theoretical foundations will be rooted in a formal model depicting the expected utility of migrating. The benefits of migration include physical safety from increasing gang violence, natural disasters, and political crises in origin countries, as well as accessing U.S. economic opportunities. The costs of migration include long and dangerous journeys to the U.S. and the strengthening of immigration policies in transit and destination countries. These costs may be reduced by social networks, religious organizations, and bandwagons of migrant caravans. In order to assess the observable implications of the model, the proposed project will generate several empirical contributions. The project will rely on Natural Language Processing (NLP) to analyze more than 14 million anonymized records of migrant apprehensions conducted both in the U.S. and Mexico between 2000-2019. This task will generate fine-grained data showing the temporal and spatial variation of migration trends. The data will enable disentangling migration patterns of single adult males, single adult females, unaccompanied minors, and family units. The empirical strategy will also rely on Machine Learning (ML) and computerized event coding from Spanish newspapers to generate unprecedented data on the territorial presence of street gangs in Central America. This approach will also be used to generate data on political instability and to track caravans of migrants throughout the region. In addition, the empirical strategy will rely on Geographic Information Systems (GIS) to generate data on natural disasters (droughts, rain, hurricanes, earthquakes, and volcanic eruptions). GIS tools will also serve to calculate the most efficient migration routes and generate models of congestion that are likely to induce bandwagon effects on migration. The anticipated results of this project should advance understanding of the determinants of regular and forced migration flows from Central America to the U.S. by relying on solid theoretical foundations and substantial empirical innovations. The project will contribute by developing reliable and valid measures of migration, and analyzing how changes in the economic, political, environmental, and security sectors influence migration flows over time and across space. The work will enable collaborations between Political Scientists and Computer Scientists, and engage undergraduate and graduate students of Hispanic origin in research tasks bridging STEM tools and social science concerns. By engaging faculty and students in Computational Social Science, the proposed project will enhance the capacity of the University of Arizona as a Hispanic Serving Institution to conduct state-of-the-art research in areas of interest to the Department of Defense.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 31, 2020
- Source ID
- W911NF2010303
Entities
People
- Javier Osorio
Organizations
- Army Contracting Command
- Office of the Secretary of Defense
- University of Arizona