Exploring and Making Sense of Large Graphs

Abstract

Graphs naturally represent information ranging from links between webpages to friendships in social networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we find the most important structures and summarize them? How can we efficiently visualize them? How can we detect anomalies that indicate critical events, such as an attack on a computer system, disease formation in the human brain, or the fall of a company? To gain insights into these problems, this thesis focuses on developing scalable, principled discovery algorithms that combine globality with locality to make sense of one or more graphs. In addition to our fast algorithmic methodologies, we also contribute graph-theoretical ideas and models, and real-world applications in two main areas. Single-Graph Exploration: We show how to interpretably summarize a single graph by identifying its important graph structures. We complement summarization with inference, which leverages information about few entities (obtained via summarization or other methods) and the network structure to efficiently and effectively learn information about the unknown entities. Multiple-Graph Exploration: We extend the idea of single-graph summarization to time-evolving graphs, and show how to scalably discover temporal patterns. Apart from summarization, we claim that graph similarity is often the underlying problem in a host of applications where multiple graphs occur (e.g., temporal anomaly detection, discovery of behavioral patterns), and we present principled, scalable algorithms for aligning networks and measuring their similarity. We leverage techniques from diverse areas, such as matrix algebra, graph theory, optimization, information theory, machine learning, finance, and social science, to solve real-world problems.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2015
Accession Number
ADA624303

Entities

People

  • Danai Koutra

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Anomaly Detection
  • Artificial Intelligence
  • Change Detection
  • Computational Science
  • Computer Languages
  • Computer Networks
  • Computers
  • Data Mining
  • Electronic Mail
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Network Science
  • Pattern Recognition
  • Social Media
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Distributed Systems and Data Platform Development
  • Graph Algorithms and Convex Optimization.

Technology Areas

  • AI & ML