Paper
18 July 2024 A study of algorithms for generating descriptions of emotionally stylized images with labels
S.Huanxin Wang, R. Gengsheng Zheng
Author Affiliations +
Proceedings Volume 13179, International Conference on Optics and Machine Vision (ICOMV 2024); 131791P (2024) https://doi.org/10.1117/12.3031760
Event: International Conference on Optics and Machine Vision (ICOMV 2024), 2024, Nanchang, China
Abstract
Generating stylized captions for images is a challenging task, in order to solve this problem, this paper proposes a new stylized image captioning method M-tag. we design a memory module M containing a set of embedding vectors for encoding stylistically relevant phrases in a training corpus. To obtain style-related phrases, we develop a sentence decomposition algorithm P, which divides the stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. Transformer encoder is used to encode the target of target features within the image and swin-Transformer encoder is used to encode the relational features within the image to jointly encode different aspects of information within the image from different perspectives. The stylized target features encoded by the target Transformer are fused with the relational features encoded by the swin-Transformer through the splicing method to achieve the purpose of fusion of intra-image relational features and local target features. When generating the caption, the content-related style knowledge is first extracted from the memory module through the noticing mechanism, and the extracted style features are then integrated into the language model, and finally the fused encoded features are decoded to generate the corresponding image description using the Transformer decoder.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
S.Huanxin Wang and R. Gengsheng Zheng "A study of algorithms for generating descriptions of emotionally stylized images with labels", Proc. SPIE 13179, International Conference on Optics and Machine Vision (ICOMV 2024), 131791P (18 July 2024); https://doi.org/10.1117/12.3031760
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Windows

Education and training

Visual process modeling

Image processing

Computer programming

Matrices

RELATED CONTENT


Back to Top