A Submodularity Framework for Data Subset Selection
Abstract
This report describes the outcome of the project A Submodularity Framework for Data Subset Selection. The goal of the project was to develop and evaluate novel submodular functions for the purpose of subselecting large sets of acoustic and text data. The subselected data sets were used to train acoustic models for automatic speech recognition or translation models for machine translation, respectively. The submodular selection techniques were evaluated against random data selection and the best comparable data selection technique previously reported in the literature. Our results demonstrate that submodular data selection outperforms all baseline techniques, i.e. for a fixed data subset size, submodular selection resulted in systems with better performance. Additionally, submodular selection was applied to the problem of feature selection, where it outperformed standard modular feature selection techniques.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2013
- Accession Number
- ADA595011
Entities
People
- Arindam Mandal
- Chris Bartels
- Jeff Bilmes
- Kai Wei
- Katrin Kirchhoff
- Yuzong Liu
Organizations
- University of Washington