SPGNet: Semantic Prediction Guidance for Scene Parsing
Abstract
Multi-scale context module and single-stage encoder-decoder structure are commonly employed for semantic segmentation. Multi-scale context module aggregates feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path. In contrast, multi-stage encoder-decoder networks have been widely used in human pose estimation and shown superior performance than their single-stage counterpart. However, few efforts have been attempted to bring this effective design to semantic segmentation. In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. We find that by carefully re-weighting features across stages, a two-stage encoder-decoder network coupled with our proposed SPG module can significantly outperform its one-stage counterpart with similar parameters and computations. Finally, we report experimental results on the semantic segmentation benchmark Cityscapes, in which our SPGNet attains 81.1% on the test set using only fine annotations.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 27, 2019
- Accession Number
- AD1153030
Entities
People
- Bowen Cheng
- Honghui Shi
- Jinjun Xiong
- Liang-Chieh Chen
- Thomas Huang
- Wen-mei Hwu
- Yukun Zhu
- Yunchao Wei
- Zilong Huang
Organizations
- IBM Research
- University of Illinois Urbana–Champaign
- University of Oregon