TSD-CAM: transformer-based self distillation with CAM similarity for weakly supervised semantic segmentation

Lingyu Yan; Jiangfeng Chen; Yuanyan Tang

doi:10.1117/1.JEI.33.2.023029

20 March 2024 TSD-CAM: transformer-based self distillation with CAM similarity for weakly supervised semantic segmentation

Lingyu Yan, Jiangfeng Chen, Yuanyan Tang

Author Affiliations +

Journal of Electronic Imaging, Vol. 33, Issue 2, 023029 (March 2024). https://doi.org/10.1117/1.JEI.33.2.023029

Abstract

Weakly supervised semantic segmentation (WSSS) using only image-level labels is a challenging task. Most existing methods utilize class activation map (CAM) to generate pixel-level pseudo labels for supervised training. However, the gap between classification and segmentation hinders the network from obtaining more comprehensive semantic information and generating more accurate pseudo masks for segmentation. To address this issue, we propose TSD-CAM, a transformer-based self distillation (SD) method that utilizes CAM similarity. TSD-CAM uses the similarity between CAMs generated from different views as a distillation target, providing additional supervision for the network and narrowing the gap between classification and segmentation. SD supervision allows the network to acquire more semantic information and refine CAMs to generate higher precision pseudo-labels. In addition, we propose the adaptive pixel refinement module, which adaptively refines and adjusts images based on pixel variations, further improving the precision of pseudo labels. Our method is a fully end-to-end single-stage approach that achieves state-of-the-art 71.3% mIoU on PASCAL VOC 2012 and 42.9% mIoU on the MS COCO 2014 dataset, and the proposed TSD-CAM can significantly outperform other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Meanwhile, the effectiveness of our method is demonstrated by a large number of ablation experiments, and we provide a new way of thinking to solve the problems of WSSS. Our code is available at: https://github.com/pipizhum/TSD-CAM.

Citation Download Citation

Lingyu Yan, Jiangfeng Chen, and Yuanyan Tang "TSD-CAM: transformer-based self distillation with CAM similarity for weakly supervised semantic segmentation," Journal of Electronic Imaging 33(2), 023029 (20 March 2024). https://doi.org/10.1117/1.JEI.33.2.023029

Received: 13 November 2023; Accepted: 7 March 2024; Published: 20 March 2024

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
20 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Content addressable memory

Image segmentation

Semantics

Education and training

Classification systems

Ablation

Visualization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years