The KOJAK Group Finder: Connecting the Dots via Integrated Knowledge-Based and Statistical Reasoning

Abstract

Link discovery is a new challenge in data mining whose primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on low-level, incomplete and noisy evidence data. To address this challenge, we are developing a hybrid link discovery system called KOJAK that combines state-of-theart knowledge representation and reasoning (KR&R) technology with statistical clustering and analysis techniques from the area of data mining. In this paper we report on the architecture and technology of its first fully completed module called the KOJAK Group Finder. The Group Finder is capable of finding hidden groups and group members in large evidence databases. Our group finding approach addresses a variety of important LD challenges, such as being able to exploit heterogeneous and structurally rich evidence, handling the connectivity curse, noise and corruption as well as the capability to scale up to very large, realistic data sets. The first version of the KOJAK Group Finder has been successfully tested and evaluated on a variety of synthetic datasets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA459397

Entities

People

  • Andre Valente
  • Eric Melz
  • Hans Chalupsky
  • Jafar Adibi

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Air Force Research Laboratories
  • Artificial Intelligence
  • Data Mining
  • Data Sets
  • Databases
  • Detection
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Models
  • National Security
  • Probabilistic Models
  • Random Variables
  • Reasoning
  • Relational Database Management Systems
  • Reliability

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML