Video description method with fusion of instance-aware temporal features

Junbin Huang; He Yan; Lingkun Liu; Yuhan Liu

doi:10.1117/12.3000765

9 August 2023 Video description method with fusion of instance-aware temporal features

Junbin Huang, He Yan, Lingkun Liu, Yuhan Liu

Proceedings Volume 12782, Third International Conference on Image Processing and Intelligent Control (IPIC 2023); 1278206 (2023) https://doi.org/10.1117/12.3000765
Event: Third International Conference on Image Processing and Intelligent Control (IPIC 2023), 2023, Kuala Lumpur, Malaysia

Abstract

There are still challenges in the field of video understanding today, especially how to use natural language to describe the visual content in videos. Existing video encoder-decoder models struggle to extract deep semantic information and effectively understand the complex contextual semantics in a video sequence. Furthermore, different visual elements in the video contribute differently to the generation of video text descriptions. In this paper, we propose a video description method that fuses instance-aware temporal features. We extract local features of instances on the temporal sequence to enhance perception of temporal instances. We also employ spatial attention to perform weighted fusion of temporal features. Finally, we use bidirectional long short-term memory networks to encode the contextual semantic information of the video sequence, thereby helping to generate higher quality descriptive text. Experimental results on two public datasets demonstrate that our method achieves good performance on various evaluation metrics.

(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Junbin Huang, He Yan, Lingkun Liu, and Yuhan Liu "Video description method with fusion of instance-aware temporal features", Proc. SPIE 12782, Third International Conference on Image Processing and Intelligent Control (IPIC 2023), 1278206 (9 August 2023); https://doi.org/10.1117/12.3000765

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Video

Video coding

Semantic video

Semantics

Feature fusion

Education and training

Visualization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years