Presentation + Paper
7 June 2024 Vision transformer quantization with multi-step knowledge distillation
Navin Ranjan, Andreas Savakis
Author Affiliations +
Abstract
Vision Transformers (ViTs) have demonstrated remarkable performance in various visual tasks, but they suffer from expensive computational and memory challenges, which hinder their practical application in the real world. Model quantization methods reduce the model computation and memory requirements through low-bit representations. Knowledge distillation is used to guide the quantized student network to imitate the performance of its full-precision counterpart teacher network. However, for ultra-low bit quantization, the student networks experience a noticeable performance drop. This is primarily due to the limited learning capacity of the smaller network to capture the knowledge of the full-precision teacher, especially when the representation gaps between the student and the teacher networks are significant. In this paper, we introduce a multi-step knowledge distillation approach, utilizing intermediate-quantized networks with varying bit precision. This multi-step knowledge distillation approach enables an ultra-low bit quantized student network to effectively bridge the gap with the teacher network by gradually reducing the model’s bit representation. We progressively teach each TA network to learn by distilling the knowledge from higher-bit quantized teacher networks from the previous step. The target student network learns from the combined knowledge of the teacher assistants and the full-precision teacher network, resulting in improved learning capacity even when faced with significant knowledge gaps. We evaluate our methods using the DeiT vision transformer for both ground level and aerial image classification tasks.
Conference Presentation
© (2024) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Navin Ranjan and Andreas Savakis "Vision transformer quantization with multi-step knowledge distillation", Proc. SPIE 13057, Signal Processing, Sensor/Information Fusion, and Target Recognition XXXIII, 130570Q (7 June 2024); https://doi.org/10.1117/12.3014158
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Quantization

Transformers

Ablation

Performance modeling

Data modeling

Network architectures

Artificial neural networks

Back to Top