Scalable Automated Model Search

Abstract

Model search is a crucial component of data analytics pipelines and this laborious process of choosing an appropriate learning algorithm and tuning its parameters remains a major obstacle in the widespread adoption of machine learning techniques. Recent efforts aiming to automate this process have assumed model training itself to be a black-box, thus limiting the effectiveness of such approaches on large-scale problems. In this work, we build upon these recent efforts. By inspecting the inner workings of model training and framing model search as bandit-like resource allocation problem, we present an integrated distributed system for model search that targets large-scale learning applications. We study the impact of our approach on a variety of datasets and demonstrate that our system, named GHOSTFACE, solves the model search problem with comparable accuracy as basic strategies but an order of magnitude faster. We further demonstrate that GHOSTFACE can scale to models trained on terabytes of data across hundreds of machines.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 20, 2014
Accession Number
ADA605116

Entities

People

  • Evan Sparks

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Computer Science
  • Computers
  • Construction
  • Data Sets
  • Dimensionality Reduction
  • Electrical Engineering
  • Information Science
  • Learning
  • Machine Learning
  • Models
  • Pipelines
  • Supervised Machine Learning
  • Training
  • Web Service

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks