A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox
Abstract
We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox . When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O (√ n ) space ( n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 40,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph by storing a minuscule fraction of edges.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Feb 17, 2015
- Source ID
- 10.1145/2700395
Entities
People
- Ali Pinar
- C. Seshadhri
- Madhav Jha
Organizations
- Defense Advanced Research Projects Agency
- Sandia National Laboratories
- United States Department of Energy