Paper
13 June 2024 Infusing knowledge into CLIP via low-rank matrix for video-text retrieval
Daqing Zhang, Jian Yin
Author Affiliations +
Proceedings Volume 13180, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024); 1318020 (2024) https://doi.org/10.1117/12.3033784
Event: International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), 2024, Guangzhou, China
Abstract
Video-text retrieval has been a crucial task with the exponential growth of video. Recent methods that leverage the pretrained CLIP model in the video-text retrieval task have demonstrated remarkable performance, surpassing many approaches trained on large-scale video-text datasets. However, these works ignore the performance-efficiency trade-off in the pursuit of better performance. Additionally, a completely new model is typically necessary for each task because many existing works fully fine-tune the pre-trained backbone. Therefore, to yield a compact and transferable model, we propose LoCLIP, a framework that transfers knowledge from CLIP in a parameter-efficient manner. Inspired by LoRA, we incorporate only a small set of trainable low-rank matrices per task, allowing adaptation to new tasks by simply replacing these matrices. In this way, we can acquire task-specific knowledge without compromising the prior knowledge stored in the pre-trained backbone. To demonstrate the effectiveness of our LoCLIP, we conduct extensive experiments and achieve comparable performance with state-of-the-art CLIP-based video-text retrieval methods while updating only a few parameters.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Daqing Zhang and Jian Yin "Infusing knowledge into CLIP via low-rank matrix for video-text retrieval", Proc. SPIE 13180, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), 1318020 (13 June 2024); https://doi.org/10.1117/12.3033784
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Matrices

Video coding

Education and training

Data modeling

Performance modeling

Transformers

Back to Top