Nearest Neighbor Classification Using a Density Sensitive Distance Measurement

Abstract

This work proposes a density sensitive distance measurement that takes into account the density of an underlying dataset to better represent the shape of the data when measuring distance. Kernel density estimation, using kernel bandwidths determined by k-nearest neighbor distances, is used to approximate the density of the underlying dataset. A scale is applied to the resulting kernel density estimate and a line integral is performed along its surface resulting in a density sensitive distance. This work tests the utility of the proposed density sensitive distance measurement using supervised learning. k-Nearest Neighbor classification using both the proposed density sensitive distance measurement and Euclidean distance are compared on the Wisconsin Diagnostic Breast Cancer dataset and the MNIST Database of Handwritten Digits. For perspective, these classifiers are also compared to Support Vector Machine and Random Forests classifiers. Stratified 10-fold cross validation is used to determine the generalization error of each classifier. In all comparisons, k-Nearest Neighbor classification using the proposed density sensitive distance measurement had less generalization error than k-Nearest Neighbor classification using Euclidean distance. For the MNIST dataset, k-Nearest Neighbor classification using the density sensitive distance measurement also had less generalization error than both Support Vector Machine and Random Forests classification.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2009
Accession Number
ADA509153

Entities

People

  • Joshua J. Burkholder

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy
  • Ground and Sea Platforms
  • Sensors
  • Weapons Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Breast Cancer
  • Computational Science
  • Data Mining
  • Data Science
  • Data Sets
  • Databases
  • Information Processing
  • Information Science
  • Kernel Functions
  • Machine Learning
  • Measurement
  • Neoplasms
  • Probability Density Functions
  • Supervised Machine Learning
  • Three Dimensional
  • United States

Fields of Study

  • Computer science

Readers

  • Computer Programming and Software Development.
  • Fluid Dynamics.
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference