THIS IS A CONTINUATION OF N00014-14-1-0819 Understanding the Behavior of Large Networks

Abstract

Understanding the behavior of large networksÑparticularly social networksÑis critical to enabling rigorous statistical modeling and inference procedures that come with provable guarantees. Modern technology means that network datasets are pervasive and fundamental in science and society. Whenever we observe entities and relations between them, either directly or induced from other data as a means of summarizing sparse dependency structure, we draw inferences from network data. These datasets are growing so rapidly in complexity and dimensionality that our analysis methods must undergo a paradigm shift to keep pace. Current Limitations The current state of the art in network analysis provides us with two kinds of models; those that we fully understand but which are too simple to model data accurately, versus those which are more realistic but come without mathematical guarantees. As an example of the latter case, the structure of many networks (again with a particular emphasis on social networks) is often analyzed by resorting to clustering based on detected network Òcommunities.Ó However, since a clustering algorithm will return clusters even when its input is purely noise, an important shortcoming is that we cannot distinguish genuine community structure from spurious structure. Furthermore, our lack of understanding of network structure goes well beyond the question of partitioning network nodes into communities. Current models fail in the amount of heterogeneity they can produce within a single network, meaning that they do not adequately describe the typical network structures we see in the real world. Finally, the algorithmic methods that could potentially address these two shortcomings have the further problem that they are typically NP-complete. This is because exact inference problems in large networks often involve brute-force combinatorial enumeration. Thus, even when statistical properties are well understood, approximations are often needed to yield computational feasibility. As a consequence, statistical guarantees can be lost. Proposed Approach We will address each of these three shortcomings by deriving new statistical asymptotics for networks in large-sample regimes, thereby yielding unique insights that lead to better understanding of their behavior. The advances we propose are also timely: they are based on technical results very recently derived by ourselves and others, as detailed in Section ?? below. In the first of three tasks, we will develop statistical hypothesis testing methods for the largesample problem of network clustering: segmenting a population of network nodes into communities of people whose connectivity profiles are ÒsimilarÓ in some way. In the second task, we will go beyond simple cluster-based models, and explore ways to increase the amount of heterogeneity that statistical models can produce within a single network. This will give rise to networks that are highly heterogeneous even in the large-sample limit. Finally, in the third task we will develop approximate inference procedures for determining network model parameters from large samples of network data, based on the results from tasks 1 and 2. Expected Outcomes The proposed research will lead to significant advances in our technical (mathematical) understanding of the behavior of large networks. First, we expect to achieve unique statistical guarantees for community detection for modularity and other algorithms related to it, based on hypothesis testing. Second, we will derive new statistical models that allow for heterogeneous networks which areÑlike real networksÑsimultaneously dense in some regions and sparse in others. Finally, we will develop novel supporting inference methodologies, with a focus on scalable algorithms and approximate inference schemes with provable properties.

Document Details

Document Type
DoD Grant Award
Publication Date
Jun 10, 2016
Source ID
N000141612180

Entities

People

  • Patrick Wolfe

Organizations

  • Office of Naval Research
  • United States Navy
  • University College London

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms