A Deep Reinforcement Learning based Long-term Tracker for Salient Event Detection

Abstract

Visual object tracking (VOT) is a critical task in many defense-related applications. VOT is a basic building block of many time-critical systems such as video surveillance and autonomous vehicle guidance. The wealth, ubiquity and volume of video data collected by the U.S. DoD demand a fast and robust tracking algorithm for searching and analyzing video databases. Furthermore, challenging mobile scenarios, as with unmanned aerial systems (UASs) and warfighter-mounted systems, the solution quality needed for object tracking in complex scenes cannot be achieved in real-time with limited, portable computing architectures. In this project, we focus on the long-term tracking problem in which the target object may undergo significant appearance changes. We posit that neither tracking nor detection can solve the long-term tracking task independently. Our approach tackles this problem by decomposing long-term tracking into three subtasks: short-term tracking, learning, and detection. The short-term tracker follows the target from frame to frame, and online learning is implemented to train a robust detector. The detection is performed when the target object cannot be identified using the short-term tracker. A scheme for salient event detection is proposed based on the results of our long-term tracker. Recent advances in target tracking have allowed re-acquisition of targets after occlusion and successful tracking in cluttered environments. However, the computational needs of these elegant methods precludes portable, low-cost implementation in real-time. The proposed project takes an adaptive approach to improving the speed without sacrificing accuracy, in which the tracker is able to switch between inexpensive feature extraction and expensive feature extraction as the tracking scenario demands. In this project, using the discriminative correlation filter (DCF) as the baseline tracker, the adaptive tracking problem is formulated as a decision-action process by training a feature switching agent (FSA) to decide whether to locate the target using inexpensive hand-crafted features or to continue switching to expensive deep features. This switching agent will significantly reduce the computational cost for facile, uncomplicated frames with a distinct or slow moving target object. Here, the FSA will be trained offline via reinforcement learning to select the optimal actions to track the target according to its current state. For the detection component, one detection activation agent (DAA) will be trained to decide when to activate the detector to save the computation cost. Finally, the tracking results of the proposed long-term tracker will be utilized to perform salient event detection. The FSA will be trained to generate a fast and robust tracker in Year One, resulting in software that performs short-term tracking. In Year Two, the DAA and the detector will be incorporated in the short-term tracker framework to perform the long-term tracking task. In Year Three, the focus will be directed toward salient event detection using the long-term tracker. The proposed approach will be validated using real data provided by U.S. Army collaborators.

Document Details

Document Type
DoD Grant Award
Publication Date
Jul 09, 2020
Source ID
W911NF2010206

Entities

People

  • Scott T. Acton

Organizations

  • Army Contracting Command
  • United States Army
  • University of Virginia

Tags

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Neural Network Machine Learning.
  • Sensor Fusion and Tracking Systems.

Technology Areas

  • AI & ML
  • Autonomy