Classification in Networked Data: A Toolkit and a Univariate Case Study

Abstract

This paper presents NetKit, a modular toolkit for classification in networked data, and a casestudy of its application to networked data used in prior machine learning research. We consider within-network classification: entities whose classes are to be estimated are linked to entities for which the class is known. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes i.e., Gaussian-field classifiers, Hopfield networks, and relational-neighbor classi- fiers. The results also show that a small number of component combinations excel. In particular, there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2006
Accession Number
ADA455651

Entities

People

  • Foster Provost
  • Sofus A. Macskassy

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Bayesian Networks
  • Computational Science
  • Computer Programming
  • Data Mining
  • Databases
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Monte Carlo Method
  • Network Science
  • Neural Networks
  • Probabilistic Models
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks