Leveraging Limited Human Input: An Optimization and Bandits Approach

Abstract

Major Goals: This project focuses on bandit and online learning algorithms for efficient resource usage. In our setting, we have a small group of human agents to perform various tasks (e.g. inspecting, labeling and analyzing data samples including social network data and images collected from surveillance, and text documents). The data samples have metadata associated with them, including location, timestamp, data-type (e.g. image, text), how was it collected, and type of content (e.g. political situation, public event). Clearly, the human agents who need to analyze these samples have differing expertise; given the limited availability of human effort, it is clearly beneficial (labeling efficiency in terms of time taken and accuracy) to assign (unlabeled) samples to those agents who have matching expertise. Our primary approach is through novel bandit models (online learning), where the algorithm iteratively determines good matchings between unlabeled samples and human agents. There are several open questions we address as part of this proposal: 1. Metrics: Traditionally, bandit algorithms focus on regret, namely, the difference in the cumulative reward/cost from that of an omniscient algorithm that knows all statistics. In our context, we are interested in queueing cost and holding costs Ð costs that penalize based on the backlog of incomplete tasks. These new metrics require new algorithms and analysis. 2. Dimensionality reduction of contexts: The metadata provides contexts, and the expertise of human agents depend on these contexts. If the contexts have a latent/hidden low dimension (i.e. there are thousands of metadata attributes, however to determine good matchings between samples and agents, a much lower dimensionality suffices), we need to develop new algorithms that can learn these hidden relationships to improve sample complexity of contextual bandit algorithms (and especially so with the new metrics). 3. Cross-learning: If there are several allocation/matching algorithms already present (e.g. methods used in the past and/or a pool of experts who can suggest good matchings), we need to develop improved algorithms that combine and cross-learn across experts (e.g. through online importance sampling methods). 4. Models with dynamics: Finally, if the set of agents change over time, we need new algorithms that adaptively learns the changes in expertise.

Document Details

Document Type
DoD Grant Award
Publication Date
Oct 11, 2018
Source ID
W911NF1710359

Entities

People

  • Sanjay Shakkottai

Organizations

  • Army Contracting Command
  • United States Army
  • University of Texas at Austin

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Neural Network Machine Learning.