A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox

Abstract

We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox . When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O (√ n ) space ( n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 40,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph by storing a minuscule fraction of edges.

Document Details

Document Type
Pub Defense Publication
Publication Date
Feb 17, 2015
Source ID
10.1145/2700395

Entities

People

  • Ali Pinar
  • C. Seshadhri
  • Madhav Jha

Organizations

  • Defense Advanced Research Projects Agency
  • Sandia National Laboratories
  • United States Department of Energy

Tags

Fields of Study

  • Computer science

Readers

  • Aviation Safety Risk Assessment.
  • Computational Modeling and Simulation
  • Graph Algorithms and Convex Optimization.

Technology Areas

  • Space