Binning in Gaussian Kernel Regularization

Abstract

Gaussian kernel regularization is widely used in the machine learning literature and proven successful in many empirical experiments. The periodic version of the Gaussian kernel regularization has been shown to be minimax rate optimal in estimating functions in any finite order Sobolev spaces. However, for a data set with n points, the computation complexity of the Gaussian kernel regularization method is of order O"n3". In this paper we propose to use binning to reduce the computation of Gaussian kernel regularization in both regression and classification. For the periodic Gaussian kernel regression, we show that the binned estimator achieves the same minimax rates of the unbinned estimator, but the computation is reduced to O"m3" with m as the number of bins. To achieve the minimax rate in the k-th order Sobolev space, m needs to be in the order of O"kn1="2k+1"", which makes the binned estimator computation of order O"n" for k = 1 and even less for larger k. Our simulations show that the binned estimator "binning 120 data points into 20 bins in our simulation" provides almost the same accuracy with only 0.4% of computation time. For classification, binning with the L2-loss Gaussian kernel regularization and the Gaussian kernel Support Vector Machines is tested in a polar cloud detection problem. With basically the same computation time, the L2-loss Gaussian kernel regularization on 966 bins achieves better test classification rate "79.22%" than that "71.40%" on 966 randomly sampled data. Using the OSU-SVM Matlab package, the SVM trained on 966 bins has a comparable test classification rate as the SVM trained on 27,179 samples, but reduces the training time from 5.99 hours to 2.56 minutes. The SVM trained on 966 randomly selected samples has a similar training time as and a slightly worse test classification rate than the SVM on 966 bins, but has 67% more su

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2005
Accession Number
ADA473041

Entities

People

  • Bin Yu
  • Tao Shi

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • Autonomy
  • Space

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Analytic Functions
  • Clustering
  • Computational Complexity
  • Data Science
  • Data Sets
  • Detection
  • Estimators
  • Hilbert Space
  • Information Science
  • Machine Learning
  • Periodic Functions
  • Simulations
  • Supervised Machine Learning
  • Training
  • White Noise

Fields of Study

  • Computer science

Readers

  • Calculus or Mathematical Analysis
  • Image Processing and Computer Vision.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks
  • Space