Cautious Collective Classification

Abstract

Many collective classification (CC) algorithms have been shown to increase accuracy when instances are interrelated. However, CC algorithms must be carefully applied because their use of estimated labels can in some cases decrease accuracy. In this article, we show that managing this label uncertainty through cautious algorithmic behavior is essential to achieving maximal, robust performance. First, we describe cautious inference and explain how four well-known families of CC algorithms can be parameterized to use varying degrees of such caution. Second, we introduce cautious learning and show how it can be used to improve the performance of almost any CC algorithm, with or without cautious inference. We then evaluate cautious inference and learning for the four collective inference families, with three local classifiers and a range of both synthetic and real-world data. We find that cautious learning and cautious inference typically outperform less cautious approaches. In addition, we identify the data characteristics that predict more substantial performance differences. Our results reveal that the degree of caution used usually has a larger impact on performance than the choice of the underlying inference algorithm. Together, these results identify the most appropriate CC algorithms to use for particular task characteristics and explain multiple conflicting findings from prior CC research.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2009
Accession Number
ADA512981

Entities

People

  • David W. Aha
  • Kalyan M. Gupta
  • Luke K. Mcdowell

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computer Languages
  • Computer Vision
  • Data Mining
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Linear Accelerators
  • Machine Learning
  • Monte Carlo Method
  • Natural Language Processing
  • Network Science
  • Regression Analysis
  • Statistical Analysis

Fields of Study

  • Computer science

Readers

  • Mathematics or Statistics
  • Regression Analysis.
  • Team-Based Human-Centered Cognitive Task Decision Making and Information Performance.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks