Geometric Factorization Tools for Community Mining
Abstract
Clustering nodes (users, entities) in a social network into different communities is an instrumental task in network analytics. Community mining has attracted tremendous attention, but there are still many challenges remaining and fundamental aspects that are poorly understood. First, modeling the underlying structure of social networks is a complex task. Existing models are either over-simplified, or lack interpretability. Second, a sound problem formulation is needed which guarantees the result in some sense; e.g., for community mining, identifiability of the underlying communities is desired, yet existing approaches rarely touch upon this issue. Third, effective and highly scalable algorithms are needed, since modern-day networks can involve a huge number of entities. Challenging scenarios such as networks with many hidden links, small-footprint outliers, and outlying communities are also of great interest to homeland security and the ArmyĆs mission. To the best of our knowledge, there is no work that can address the aforementioned desiderata under a unified framework. Our first goal is to design provably sound community identification criteria which ensure that the ground-truth membership matrix is the unique optimal solution, under realistic conditions. We will also design custom high-performance and scalable algorithms to handle the proposed criteria. We will pay particular attention to challenging scenarios of critical importance to national security and the department of defense. Finally, we will design and implement a targeted data collection campaign, and use the resulting data to validate our methods and claims. The proposed research will evolve along the following thrusts: (T1) Thrust 1 - Identification Criterion Design will focus on novel identifiability-guaranteed community mining criteria. (T2) Thrust 2 - High-performance Algorithms will devise companion optimization-based algorithms. (T3) Thrust 3 - Advanced Topics in Community Mining will focus on challenging and realistic problems like small-footprint clandestine group identification, and community identification in the presence of a large number of missing links. (T4) Thrust 4 - Validation, Data Collection, and Exploration will validate the proposed approaches on network data that is relevant to national security and defense. Our activities under this thrust will include a data collection campaign designed to create the kinds of datasets that will enable us to verify our capability to single out unusual small footprint groups.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 06, 2019
- Source ID
- W911NF1910407
Entities
People
- Nikolaos Sidiropoulos
Organizations
- Army Contracting Command
- United States Army
- University of Virginia