Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from Texts

Abstract

This thesis is motivated by the need for scalable and reliable methods and technologies that support the construction of network data based on information from text data. Ultimately, the resulting data can be used for answering substantive questions about socio-technical networks. One main limitation with this approach is that the validation of the resulting network data can be hard to infeasible, e.g. in the cases of covert, past and large-scale networks. This thesis addresses this problem by identifying the impact of coding choices that must be made when extracting network data from text data on the structure of networks and network analysis results. The findings suggest that conducting reference resolution on the text data can alter the identity and weight of 76% of the nodes and 23% of the links, and cause major changes in the value of commonly used network metrics. Also, completely different sets of key nodes are found when reference resolution is applied to the text data prior to conducting relation extraction. Based on the outcome of these experiments, I recommend strategies for avoiding or mitigating the outlined issues in practical applications.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2012
Accession Number
ADA558970

Entities

People

  • Jana Diesner

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Energy and Power Technologies
  • Engineered Resilient Systems
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Automata Theory
  • Cognitive Science
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Programming
  • Data Mining
  • Geography
  • Information Processing
  • Information Science
  • Named Entity Recognition
  • Natural Language Processing
  • Network Science
  • Ontologies
  • Self Organizing Systems
  • Social Networking Services

Fields of Study

  • Computer science

Readers

  • Business Analytics
  • Operations Research
  • Systems Analysis and Design