Interactive Algorithms for Unsupervised Machine Learning

Abstract

This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback-driven measurements to reduce data acquisition costs and consequently enable statistical analysis in otherwise intractable settings. Unsupervised learning methods are fundamental tools across a variety of domains, and interactive procedures promise to broaden the scope of statistical analysis. We develop interactive learning algorithms for three unsupervised problems: subspace learning, clustering, and tree metric learning. Our theoretical and empirical analysis shows that interactivity can bring both statistical and computational improvements over non-interactive approaches. An over-arching thread of this thesis is that interactive learning is particularly powerful for non-uniform datasets, where non-uniformity is quantified differently in each setting. We first study the subspace learning problem, where the goal is to recover or approximate the principal subspace of a collection of partially observed data points. We propose statistically and computationally appealing interactive algorithms for both the matrix completion problem, where the data points lie on a low dimensional subspace, and the matrix approximation problem, where one must approximate the principal components of a collection of points. We measure uniformity with the notion of incoherence, and we show that our feedback-driven algorithms perform well under much milder incoherence assumptions. We next consider clustering a dataset represented by a partially observed similarity matrix. We propose an interactive procedure for recovering a clustering from a small number of carefully selected similarity measurements. The algorithm exploits non-uniformity of cluster size, using few measurements to recover larger clusters and focusing measurements on the smaller structures.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2015
Accession Number
ADA624286

Entities

People

  • Akshay Krishnamurthy

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Computational Science
  • Computer Science
  • Data Acquisition
  • Data Mining
  • Data Science
  • Experimental Design
  • Information Processing
  • Information Science
  • Machine Learning
  • Network Science
  • Signal Processing
  • Statistical Algorithms
  • Statistical Analysis
  • Supervised Machine Learning
  • Unsupervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms