Link prediction and community inference in networks using latent geometry
Abstract
Link prediction and community inference are thriving areas of active research with a host of high-impact interdisciplinary applications including the identification of disease modules in protein networks, functional areas in the brain, predicting or recommending future friendships or collaborations in social networks, recommending products to users, predicting decision making under stress, or reconstructing networks from partial data. A long and quickly growing list of approaches to link prediction and community detection have been developed over recent years, none without problems or limitations. Most link prediction methods cannot reliably predict links in incomplete networks, or certain classes of links, or weights in weighted networks, while resolution limits, overlapping communities, uneven community sizes, and unknown numbers of communities are classic problems in community inference. In our past research we have developed a unique latent-geometric approach to the analysis of network structure and dynamics. In this approach, network nodes are assumed to lie in a latent space, and connections are formed probabilistically, with probability depending on the distance between nodes in the latent space. We have shown that if the latent space is not Euclidean as has been routinely assumed, but hyperbolic, and if the connection probability has a specific entropy-maximizing shape, then this approach is indeed unique in two ways: 1) It is the only existing approach to large-scale network analysis capable of modeling, in a unified framework, all the common large- scale structural and dynamical properties of many real networks. These properties are sparsity, power-law degree distribution, strong clustering, and community structure. Yet besides modeling these basic properties, a long list of other large-scale structural and dynamical properties of many real networks are reproduced as well. 2) The approach is statistically unbiased, meaning that it satisfies the maximum-entropy principle, subject to the constraints imposed by the large-scale properties of a given network. The main goal of the project is to make this approach to large-scale network analysis, practically applicable to network analysis at small (link prediction) and medium (community inference) scales. The combination of the above two key properties of the approach is attractive, because they guarantee that a given real network that has the large-scale properties mentioned above is a typical network in the model, so that the predictive power of the model is expected to be close to maximum possible. Furthermore, both link prediction and community inference are put on the same methodological grounds within the approach. Having a given real network mapped to its latent space, both link prediction and community inference are based solely on pairwise latent distances between nodes: the closer the two unlinked nodes are in the space, the more likely they will be linked in the future, or are already linked and their link is missing in the data, whereas soft or generalized communities are defined as groups of nodes located close to each other in the latent space. This soft latent-geometric approach to link prediction and community inference avoids many problems that traditional methods have to deal with. Since the approach relies only on latent distances, it can predict equally well all classes of links in different types of networks, including bipartite and weighted networks, while the concept of soft communities, besides avoiding most problems plaguing traditional sharp communities, opens the door to a new class of practically viable applications, methodologically similar to link prediction...
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 11, 2018
- Source ID
- W911NF1710491
Entities
People
- Dmitri Krioukov
Organizations
- Army Contracting Command
- Northeastern University
- United States Army