Nonextensive Entropic Kernels

Abstract

Positive definite kernels on probability measures have been recently applied in classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon's) mutual information and the Jensen-Shannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive generalizations of Shannon's information theory. This paper bridges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JS-type divergences. These new divergences result from extending the two building blocks of the classical JS divergence: convexity and Shannon's entropy. The classical notion of convexity is extended to the wider concept of q-convexity, for which we prove a Jensen q-inequality. Based on this inequality, we introduce Jensen-Tsallis (JT) q-differences, a nonextensive generalization of the JS divergence, and define a k-th order JT q-difference between stochastic processes. We then define a new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, JS, and linear kernels as particular cases. Nonextensive string kernels are also defined that subsume the p-spectrum kernel. We illustrate the performance of these kernels on text categorization tasks, in which documents are modeled both as bags-of-words and as sequences of characters.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2008
Accession Number
ADA488505

Entities

People

  • AndrĂ© F. Martins
  • Eric P. Xing
  • Mario A. Figueiredo
  • Noah Smith
  • Pedro M. Aguiar

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Computational Science
  • Computer Vision
  • Ergodic Processes
  • Image Processing
  • Information Processing
  • Information Science
  • Information Theory
  • Kernel Functions
  • Machine Learning
  • Probability
  • Probability Distributions
  • Random Variables
  • Signal Processing
  • Stochastic Processes
  • Supervised Machine Learning
  • Theorems

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Graph Algorithms and Convex Optimization.
  • Statistical inference.