20 March 2024 TSD-CAM: transformer-based self distillation with CAM similarity for weakly supervised semantic segmentation
Lingyu Yan, Jiangfeng Chen, Yuanyan Tang
Author Affiliations +
Abstract

Weakly supervised semantic segmentation (WSSS) using only image-level labels is a challenging task. Most existing methods utilize class activation map (CAM) to generate pixel-level pseudo labels for supervised training. However, the gap between classification and segmentation hinders the network from obtaining more comprehensive semantic information and generating more accurate pseudo masks for segmentation. To address this issue, we propose TSD-CAM, a transformer-based self distillation (SD) method that utilizes CAM similarity. TSD-CAM uses the similarity between CAMs generated from different views as a distillation target, providing additional supervision for the network and narrowing the gap between classification and segmentation. SD supervision allows the network to acquire more semantic information and refine CAMs to generate higher precision pseudo-labels. In addition, we propose the adaptive pixel refinement module, which adaptively refines and adjusts images based on pixel variations, further improving the precision of pseudo labels. Our method is a fully end-to-end single-stage approach that achieves state-of-the-art 71.3% mIoU on PASCAL VOC 2012 and 42.9% mIoU on the MS COCO 2014 dataset, and the proposed TSD-CAM can significantly outperform other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Meanwhile, the effectiveness of our method is demonstrated by a large number of ablation experiments, and we provide a new way of thinking to solve the problems of WSSS. Our code is available at: https://github.com/pipizhum/TSD-CAM.

© 2024 SPIE and IS&T
Lingyu Yan, Jiangfeng Chen, and Yuanyan Tang "TSD-CAM: transformer-based self distillation with CAM similarity for weakly supervised semantic segmentation," Journal of Electronic Imaging 33(2), 023029 (20 March 2024). https://doi.org/10.1117/1.JEI.33.2.023029
Received: 13 November 2023; Accepted: 7 March 2024; Published: 20 March 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Content addressable memory

Image segmentation

Semantics

Education and training

Classification systems

Ablation

Visualization

Back to Top