Online Model Distillation for Efficient Video Inference

Abstract

High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17x lower inference runtime cost (11 to 26x in FLOPs), even when the target videos distribution is non-stationary. Our method requires no offline pretraining on the target video stream, achieves higher accuracy and lower cost than solutions based on flow or video object segmentation, and can exhibit better temporal stability than the original teacher. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 27, 2019
Accession Number
AD1154644

Entities

People

  • Deva Ramanan
  • Kayvon Fatahalian
  • Keyi Zhang
  • Ravi T. Mullapudi
  • Steven Chen

Organizations

  • Carnegie Mellon University
  • Stanford University

Tags

Communities of Interest

  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Cameras
  • Computer Vision
  • Computers
  • Convolutional Neural Networks
  • Data Mining
  • Detection
  • High Resolution
  • Image Recognition
  • Machine Learning
  • Neural Networks
  • Pattern Recognition
  • Recognition
  • Students
  • Training
  • Video Frames

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Image Processing and Computer Vision.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Neural Networks