RESEARCH AREA 10: NETWORK SCIENCE: Local Algorithms for Large Informatics Graphs
Abstract
A serious problem with many existing machine learning and data analysis tools in the complex networks area is that they are often very brittle and/or do not scale well to larger networks. As a consequence, analysts often develop intuition on small networks, with $10^2$ or $10^3$ nodes, and then try to apply these methods on larger networks, with $10^5$ or $10^7$ or more nodes. Larger networks, however, often have very different static and dynamic properties than smaller networks. These properties are often deeply counterintuitive and surprisingly subtle to identify and test, but they often have consequences that are very practically relevant. At root, the problem is that existing algorithmic/statistical methods fail to capture properly the ÒcouplingÓ between very local structure and very global structure in these networks. For example, local structure might be the properties of ego networks in large social graphs, and global structure might be the diameter and other large-scale properties of the entire network. These failings clearly indicate that the metrics that are intuitive to people, based on their experience with smallscale networks, and that are used to control inference or obtain domain-specific insight, are often simply inappropriate for very large-scale graphs. Relatedly, existing algorithmic/statistical tools make local-global assumptions that are strongly violated by realistic networks. This has very detrimental consequences for our ability to extract domain insight from algorithmic tools applied to these networks. The proposed research will develop and implement existing and novel local and locally-biased algorithms on large networks to evaluate how those algorithms perform for downstream tasks of interest on realistic social/information networks to develop insights for use in downstream network analysis. In particular, primary research directions that will be considered include the following. Local algorithms for other network problems: these algorithms can be applied to a small part of a very large graph, and they can come with improved algorithmic and/or statistical guarantees. Implementations on large-scale graphs: the behavior of these local algorithms depends strongly on implementation details, for graphs consisting of hundreds of thousands up to hundreds of millions of~nodes. Relationship with meaningful metrics of domain interest: these algorithms often have implicit embedding properties that can meaningfully be interpreted in terms of metrics of interest to the domain scientist. Intermediate-scale structure: as the size of the local region grows, these algorithms can be used to characterize intermediate-scale structure, e.g., whether a small contagion will propagate to the entire~graph. Algorithmic-statistical tradeoffs: the improved algorithmic/statistical guarantees of these algorithms are sometimes synergistic and sometimes not, and characterizing these tradeoffs is essential for their use. The research work will focus on these methodological developments with an eye toward solving important practical problems such as scalable inference on graphs, understanding and controlling contagion and viral propagation on networks, and developing tools that are useful to the domain scientist at extracting interpretable and actionable insight from noisy and complex networked data.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Sep 11, 2018
- Source ID
- W911NF1610285
Entities
People
- Michael Mahoney
Organizations
- Army Contracting Command
- International Computer Science Institute
- United States Army