A Submodularity Framework for Data Subset Selection

Abstract

This report describes the outcome of the project A Submodularity Framework for Data Subset Selection. The goal of the project was to develop and evaluate novel submodular functions for the purpose of subselecting large sets of acoustic and text data. The subselected data sets were used to train acoustic models for automatic speech recognition or translation models for machine translation, respectively. The submodular selection techniques were evaluated against random data selection and the best comparable data selection technique previously reported in the literature. Our results demonstrate that submodular data selection outperforms all baseline techniques, i.e. for a fixed data subset size, submodular selection resulted in systems with better performance. Additionally, submodular selection was applied to the problem of feature selection, where it outperformed standard modular feature selection techniques.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2013
Accession Number
ADA595011

Entities

People

  • Arindam Mandal
  • Chris Bartels
  • Jeff Bilmes
  • Kai Wei
  • Katrin Kirchhoff
  • Yuzong Liu

Organizations

  • University of Washington

Tags

Communities of Interest

  • Autonomy
  • C4I

DTIC Thesaurus Topics

  • Air Force
  • Artificial Intelligence Software
  • Automata Theory
  • Automated Speech Recognition
  • Automated Text Summarization
  • Computational Science
  • Computer Languages
  • Data Sets
  • Electrical Engineering
  • Feature Selection
  • Information Science
  • Machine Learning
  • Machine Translation
  • Natural Language Processing
  • Neural Networks
  • Recognition
  • Signal Processing

Fields of Study

  • Computer science

Readers

  • Mycotoxin ecology in Amazonian ecosystems.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation