SPGNet: Semantic Prediction Guidance for Scene Parsing

Abstract

Multi-scale context module and single-stage encoder-decoder structure are commonly employed for semantic segmentation. Multi-scale context module aggregates feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path. In contrast, multi-stage encoder-decoder networks have been widely used in human pose estimation and shown superior performance than their single-stage counterpart. However, few efforts have been attempted to bring this effective design to semantic segmentation. In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. We find that by carefully re-weighting features across stages, a two-stage encoder-decoder network coupled with our proposed SPG module can significantly outperform its one-stage counterpart with similar parameters and computations. Finally, we report experimental results on the semantic segmentation benchmark Cityscapes, in which our SPGNet attains 81.1% on the test set using only fine annotations.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 27, 2019
Accession Number: AD1153030

Entities

People

Bowen Cheng
Honghui Shi
Jinjun Xiong
Liang-Chieh Chen
Thomas Huang
Wen-mei Hwu
Yukun Zhu
Yunchao Wei
Zilong Huang

Organizations

IBM Research
University of Illinois Urbana–Champaign
University of Oregon

SPGNet: Semantic Prediction Guidance for Scene Parsing

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers