Paper
13 June 2024 An image description model based on improved attention mechanism feature fusion
Author Affiliations +
Proceedings Volume 13180, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024); 131801E (2024) https://doi.org/10.1117/12.3033702
Event: International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), 2024, Guangzhou, China
Abstract
Image description refers to the process of automatically generating natural language descriptions that are strongly related to the content of an image, and it is a cross-disciplinary field that combines computer vision and natural language processing. This paper proposes an improved attention mechanism for image feature fusion, which addresses the limitations of existing image feature extraction methods. The proposed method uses an encoder-decoder structure, where an improved GAM attention module is used to fuse the grid features of images with edge features. Furthermore, a mesh memory structure is employed to further enhance the fused features, resulting in richer image features. Through the decoder, more accurate image descriptions can be generated. The model was evaluated using mainstream evaluation metrics BLEU, ROUGEL, CIDEr, and METEOR, and validated on the public MS COCO dataset. Experimental outcomes demonstrate that the image description model proposed based on GAM, which merges different features, achieves favorable performance across various evaluation criteria and further enhances the capability of image description.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Hongyu Qiu, Shangyou Zeng, and Feiyan Huang "An image description model based on improved attention mechanism feature fusion", Proc. SPIE 13180, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), 131801E (13 June 2024); https://doi.org/10.1117/12.3033702
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image fusion

Feature extraction

Image processing

Feature fusion

Education and training

Image quality

Data modeling

Back to Top